



J )(. SJiofetf f/i if a mi 


ADVANCED 

CALCULUS 













ADVANCED 

CALCULUS 


Third Edition 


ANGUS E. TAYLOR 

University of California 

W. ROBERT MANN 

University of North Carolina 


JOHN WILEY & SONS, INC. 

New York I Chichester I Brisbane I Toronto! Singapore 




Copyright © 1955, 1972, 1983, by John Wiley & Sons, Inc. 

All rights reserved. Published simultaneously in Canada. 

Reproduction or translation of any part of 
this work beyond that permitted by Sections 
107 and 108 of the 1976 United States Copyright 
Act without the permission of the copyright 
owner is unlawful. Requests for permission 
or further information should be addressed to 
the Permissions Department, John Wiley & Sons. 

Library of Congress Cataloging in Publication Data : 

Taylor, Angus Ellis, 191 1— 

Advanced calculus. 

Includes index. 

1. Calculus. I. Mann, W. Robert (William 
Robert), 1920- II. Title. 

QA303.T2 1982 515 81-16141 

1S8M-1 C J t)'7iOa566.-fe AACR2 

Printed in the United States of America 

10 

5 can A e A O A Nj O 9 /& 5 * / c Coo l 4 



PREFACE 


The concept of a course in advanced calculus has a long history in American 
colleges and universities. Quite naturally, the concept has not remained static. 
Nor does it have exactly the same connotation to all teachers, students, and users 
of mathematics at any given time. There are a variety of needs to be served by 
a textbook in advanced calculus. In planning this edition of our book, we have 
retained the basic qualities that contributed to user satisfaction with the first two 
editions while changing and introducing some new features that we believe to be 
important as well as appropriate in the light of current conditions. 

We believe that the scope of a book on advanced calculus should be broad 
enough to: 

1. Build a bridge from elementary calculus to higher mathematical analysis suitable 
for use by students of two sorts, both those intending to specialize in mathematics 
and those who need more advanced mathematics as a tool in other studies or in 
their employment. 

2. Provide a thorough treatment of the calculus of functions of several variables and 
of vector functions of vector variables. 

3. Provide a firm grounding in the fundamentals of analysis, embracing at least the 
following topics: point set theory on the line and in Euclidean space, continuous 
functions and mappings, uniform convergence, the Riemann integral, and infinite 
series. 

4. Pay due attention to some topics important for applied mathematics, including 
some theory of curves and surfaces; vector fields; such notions as gradient, 
divergence, and curl and their occurrence in integral theorems; some notions about 
numerical methods and the potential for use of simple computers in problems 
calling for approximation or minimization; and improper integrals. 

Our mode of treatment of the material is based on our belief that the book 
should be more than a skeletal framework of tersely stated assumptions, 
definitions, theorems, and proofs. The exposition is designed to make the book 
readable and understandable by a student through his/her own efforts if he/she 
will read carefully and learn to verify or carry through for himself/herself the 
steps of reasoning that are either given or indicated. An important part of an 
advanced calculus course is the training that it entails in deductive reasoning from 
explicit assumptions and definitions. The.best way of assuring that a student will 
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benefit from this training is to make the process interesting by providing suitable 
motivation and enough guidance to assure to the diligent student the pleasure of 
success. With confidence thus gained the student may proceed to become an ever 
more independent learner. 

Our book is written on the assumption that students using it have normal skill 
in the formal aspects of elementary calculus and that they can draw freely on the 
standard subject matter of algebra, trigonometry, analytic geometry, and 
elementary calculus. In Chapter 1, we present a systematic overview of the more 
theoretical side of elementary calculus of functions of one real variable, but 
without a full treatment of those topics that depend in a crucial way on a rigorous 
exposition of the properties of the real number system. Such an exposition is 
given in Chapter 2, followed in Chapter 3 by a rigorous presentation of those 
properties of continuous real functions of a real variable that are essential to the 
theoretical structure of differential calculus of such functions. These chapters 
form a part of the bridge to higher mathematical analysis. The extent to which 
Chapter 1 will need to be formally included in a course of advanced calculus will 
depend on the level of preparation of the students and judgment of the teacher. 
Even if not extensively used for regular assignments, Chapter 1 is useful for 
reference and supplementary reading and study and can provide motivation for 
the work in Chapters 2 and 3. 

Chapter 4 is on several special topics somewhat apart from the mainstream 
of a course on advanced calculus. The results are occasionally used in important 
ways here and there in the book; they are available for reference and can be taken 
up by the individual student or by the teacher if the need arises. 

Parts of Chapter 6 and 13 will be to some extent familiar to students from 
work in more elementary courses. It is feasible to omit Chapter 13 almost entirely 
from a course based on the book except insofar as some parts early in Chapter 
13 may be needed for reference in connection with §14.6 and §§18.6 and 18.61. 

A thorough treatment of the differential calculus of real functions of several 
variables and vector functions of vector variables is provided in Chapters 6, 7, 
8, 9, and 12, with important supporting material on vectors, matrices, and linear 
transformations in Chapters 10 and 11. Two different approaches to implicit 
function theorems and inversions of mappings are provided: without vectors in 
Chapters 8 and 9, with vectors in Chapter 12. The vector approach is the modern 
way, and it has many advantages. But the older, classical, way of dealing with 
implicitly defined real functions of several variables (Theorem I, §8,1) is simple, 
elegant, and very instructuve; it deserves to be remembered. We think the local 
inversion of mappings should be understood and appreciated in the context of 
mappings from R 2 to R 2 , as in Chapter 9, before dealing with the general inversion 
theorem in §12.6. 

The amount of linear algebra needed to support the vector differential 
calculus of Chapter 12 is presented in Chapter 11 (and, to some extent, in parts 
of Chapter 10). Chapter 11 also contains the minimum amount of material on 
norms and metrics that is needed to discuss continuity and differentiability of 
vector functions of vectors. Our discussion is, for the most part, limited to finite 
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dimensional vector spaces with the standard orthonormal system of basis vectors 
associated with a Cartesian system of coordinates. Thus we regularly use the 
Euclidean norm and metric in R". We also use the matrix representation of linear 
transformations from R” to R m that goes along with the use of standard 
orthonormal bases. Students using the book are expected to have some familiarity 
with matrices and elementary linear algebra. Some glimpses using more abstract 
points of view are offered. 

In Chapter 12 we present with care the definitions of the differential and the 
derivative (taken in that order) of a vector function from R" to R m . This is a 
subject that has needed some clarification in the textbook literature. If f(x) is a 
function from a part of R" to R m , differentiable for certain values of x, the 
differential df of f is a function of two variables x and dx that is linear in dx (where 
dx varies over all of R n ) and has values in R m . For a given x at which f is 
differentiable the linear transformation that takes dx into the value of df at (x, dx) 
is called the derivative of f at x and denoted by f(x). Thus the value of df is 
f(x) dx, where the juxtaposition means that f (x) acts on dx. Observe that the 
value of f' is a linear transformation from R n to R m , whereas the value of df is 
a vector in R m . Thus the derivative and the differential are quite different 
functions. 

A third important block of material for courses in advanced calculus is 
provided by Chapters 16, 17, 20, and at least the first part of Chapter 18. These 
provide essential material for the study of limits, convergence, and continuity for 
functions from R n to R m (including the special cases when n = 1 or m = 1, or both), 
together with a completion of the theoretical structure of elementary integral 
calculus. To this block may be a^ded, if time permits, selections from Chapters 
19, 21 and 22, depending on the interests of the class and the teacher. 

Most of the main topics mentioned thus far in this survey of the book are 
important for applied as well as for pure mathematics. The same is true for other 
topics yet to be mentioned, although the treatment of these topics is aimed rather 
more at those interested in applications than at those interested in pure theory. 
We are referring to the later part of Chapter 10 (on scalar and vector fields), most 
of Chapters 14 and 15 and parts of Chapter 22. Some selection from this body 
of material will very likely be appropriate and desirable in a course in advanced 
calculus with a broad clientele. 

There is another way of looking at mathematical analysis, in which one 
classifies the content of a book, not by the various topics and different subjects, 
but according to the following categories: ideas and concepts, theorems and 
proofs, and specific problems, together with methods for dealing with them and 
the information provided by the solution. These categories are not completely 
separate from each other, of course. Concepts enter into the statements and 
proofs of theorems. Some theorems provide answers to interesting problems, and 
techniques of proof may (but do not in all cases) furnish explicit solutions to 
problems. We believe that the vitality of mathematics derives in a highly 
significant way from interest in the solution of problems that can be formulated 
mathematically. The generation of powerful methods for solving problems 
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necessitates the building of theories (which may be somewhat abstract or 
elaborate) to justify the methods and to amass a body of knowledge useful to 
those who apply the theories and the methods. We have endeavored to deal with 
our subject with a judicious (and, we hope, instructive) mixture of attention to 
concepts, theories, problems, and techniques of proofs and solutions. 

Among the specific ways in which this edition differs from the second edition 
are these: substantial changes in the discussion of quadratic forms in §6.9, a 
number of changes in Chapter 7, especially in the discussion of critical points in 
§7.6, a considerable revision of Chapter 9, addition to Chapter 10 of material on 
vectors in space of n -dimensions, a number of revisions in Chapters 11 and 12 
(including major revisions of the treatment of the differential and the proof of the 
inversion theorem) and a change in the discussion of Stirling’s formula (§22.8) 
with a sharper result. 

An important innovation is the use made of programmable pocket calculators. 
This is in Chapter 12, which some students regard as forbiddingly theoretical. We 
think this may be partly due to the difficulty in developing a feeling for the 
derivative of a function from R" to R m . To alleviate this difficulty we have 
introduced Newton’s method at the level of problems involving the derivative of 
functions from R" to R". We have also tried to present more clearly the gradient 
as a derivative by including some numerical applications of the method of 
steepest descent. Along similar lines we give a generalization of the elementary 
product formula for differentiation, which gives the derivative of the scalar 
product of two vector-valued functions. This formula is then used to arrive at the 
method of least squares and the idea of a generalized matrix inverse. These 
practical applications cannot, of course, make the basic theory easier, but 
according to our experience, they are helpful to the students’ understanding. It 
is still too early to discern exactly what the optimal role of abudant, cheap, and 
easy computational capability in teaching advanced calculus is going to be. We 
believe however that the present state of calculator technology offers oppor- 
tunities to combine theory and practice in a way that illuminates both. 

In conclusion we want to express, as we did in the second edition, our debt 
of gratitude to the students we have taught, from the teaching of whom we have 
learned much. We have enjoyed our teaching, partly because it has deepened our 
own learning, but especially when we have seen our students growing in 
understanding and appreciation of mathematical analysis. The questions and 
comments of students have often led us to new insights both in the subject and 
in our ways of teaching. We hope that other students and other teachers will find 
that this book opens the doors to understanding and enjoyment. 

Angus E. Taylor 

W. Robert Mann 
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1 / FUNDAMENTALS 
OF ELEMENTARY 
CALCULUS 


1 / INTRODUCTION 

A course in advanced calculus must build upon the presumption that students 
studying the subject have already gained some knowledge of elementary cal- 
culus. We shall therefore begin by taking a backward look over those parts of 
calculus with which the reader of this book should have facility and a measure 
of understanding. Our object in such a retrospect is not to conduct a systematic 
review. The purpose is, rather, to establish a common point of view for students 
whose training in calculus, up to this point, must inevitably reflect a wide variety 
of practices in teaching, choice of subject matter, and distribution of emphasis 
between the acquisitions of problem-solving skills and mastery of fundamental 
theory. As we survey the field of elementary calculus we shall stress the 
conceptual aspect of the subject: fundamental definitions and processes which 
underlie all the applications. In a first course in calculus it is often the case that 
the fundamental notions are introduced through the medium of particular 
geometrical or physical applications. Thus, to the beginner, the derivative may 
be typified by, or even identified with, the speed of a moving object, while the 
integral is thought of as the area under a curve. We now seek to take a more 
general, or abstract, view. Differentiation and integration are processes which 
are carried out upon functions. We need to have a clear understanding of the 
definitions of these processes, quite apart form their applications. 

Another aspect of our survey will be our concern with the logical unfolding 
of the fundamental principles of calculus. Here again we strive to take a more 
mature point of view. We wish to indicate in what respects it is desirable and 
necessary to look more deeply into the derivations of rules and proofs of 
theorems. There are places in elementary calculus, as usually taught to begin- 
ners, where the development is necessarily inadequate from the standpoint 
of logic. In many places the reasoning leans heavily on intuition or on one sort or 
another of plausibility argument. That this state of affairs persists is partly due to 
a deliberate placing of emphasis: we make our primary goal the attainment of 
skill in the manipulative techniques of calculus which lend themselves readily to 
applications at an elementary level in physics, engineering, and the like. This 
kind of skill (up to a certain point) can be imparted without paying much 
attention to questions of logical rigor. But it is also true that there are logical 
inadequacies in a first course in calculus which cannot be made good entirely 
within the customary time limits of such a course (two or three semesters), even 
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where a reasonably heavy emphasis is laid upon “theory.” At bottom the subject 
of calculus rests upon the real number system and the theory of limits. A full 
appreciation and understanding of this foundation material must come slowly, 
but the need for such understanding becomes more acute as we progress in 
learning. In advanced calculus we must make a deeper study of the real number 
system, of the theory of limits, and of the properties of continuous functions. In 
this way only can we proceed easily and with confidence to a mastery of many 
new concepts and processes of higher mathematics. 


1.1 / FUNCTIONS 

At the very outset we must discuss the mathematical concept of a function , for 
we shall constantly be talking about properties of functions and about processes 
which are applied to functions. The function concept has been very much 
generalized since the early development of calculus by Leibniz and Newton. At 
the present time the word “function” is used broadly to mean any determinate 
correspondence between two classes of objects. 

Example L Consider the class of all plane polygons. If to each polygon we make 
correspond the number which is the perimeter of the polygon (in terms of some 
fixed unit of length), this correspondence is a function. Here the first class of 
objects is composed of certain figures, while the members of the second class are 
positive numbers. 

To begin with, let us consider functions which are correspondences between 
sets of real numbers. Such functions are called real functions of a real variable . 
The first set of numbers is the domain of definition or simply the domain of the 
function. The second set, consisting of the values taken on by the function, is 
called the range. Once the domain, which we may call D, has been specified, the 
function is defined as soon as a definite rule of correspondence has been given, 
assigning to each number of D some corresponding number in the range. If x is a 
symbol which may be used to denote any member of D, we call x the 
independent variable of the function. In some situations it is very natural to have 
more than one number associated with a given value of x and to call such a 
correspondence a multiple-valued function. If each value of x corresponds to 
just one number in the range we have a single-valued function , which is what is 
properly meant by the term function. We usually find it possible to deal with 
multiple-valued functions by separating them into several (possibly infinitely 
many) single-valued functions. Hereafter we shall always assume that all func- 
tions referred to are single-valued, unless the situation explicitly indicates the 
contrary. 

A function may be defined by an algebraic or trigonometric formula, but it 
need not be so. 

Example 2. If x denotes any real number, let [x] denote the algebraically 
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largest integer which does not exceed x; e.g., 

[-2.3] = - 3, [-1] = - 1, [0] = 0, [3.5] = 3, [7] = 7, [7.2] = 7. 

The correspondence between x and [x] defines a function. If we use / to denote 
this function, then we would say that / is defined by /(x) = [x]. 

Example 3 . Another simple function is defined by associating with x its 
absolute value |x|. The definition of |x| is: 

\x\ = x if x>0, |x| == — x if x<0. 

Thus |7| = 7, |0| = 0, | — 5| = 5, |3 — 10| = 7. If we think of x as a point on a number 
scale (the x-axis), then |x| is the numerical distance (always nonnegative) 
between x and the origin. 

The concept and the symbolism of absolute value are quite important. The 
student will need to get accustomed to reading sentences that contain in- 
equalities and absolute values. Thus, for instance, |7-5| = 2, | — 16 — (-10)| = 6, 
and, in general, |xi - x 2 | is the distance between points X\ and x 2 on the x-axis. As 
another example, |x - 5| < 2 means that the distance between x and 5 is less than 
2; this is equivalent to saying that x lies between 3 and 7. We can write this in 
the form 3 < x < 7. A general statement of the same sort is that |x - a\ < b (where 
b > 0) is equivalent to a - b <x < a + b. 

We regard functions as mathematical entities, and represent them by sym- 
bols. The commonly used symbols are the Latin letters /, g, h, F, G, H , and the 
Greek letters i/f, d>, 'P, but in principle any symbol may be used. If / is the 
symbol for a particular function, we use /(x) to represent the number which the 
function makes correspond to any particular value of the independent variable 
x; this is called the value of the function at x. 

Example 4. Let / be the symbol for the function which makes correspond to 
a positive number the natural logarithm of that number. Then f(x) = log c x. (We 
shall normally drop the subscript e and write log x in place of log* x.) 

There is some ambiguity in the use of functional notation, for f(x) is 
frequently used as a symbol for the function itself, as well as for the value of the 
function. Thus, for example, we speak of the “function sinx,” “the function 
x 2 - 3x +5,” or “the function <£(x).” There is of course a difference between the 
function and the value of the function. If the symbol f(x) appears, the context 
will usually make clear whether reference is being made to the function or to 
the value of the function. To avoid possible ambiguity we shall cultivate the 
practice of writing “the function /” instead of “the function /(x).” This usage is 
in accord with prevalent practice in current literature, and the student will do 
well to become familiar with it. 

If y is a symbol for the value of the function / at x, we can write y =f(x). 
Here y is called the dependent variable ; we say that y is a function of x. In 
elementary calculus most of the stress is upon functions which are defined by 
means of fairly simple formulas connecting the independent variable x and the 
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dependent variable y. Here, however, we look toward understanding the prin- 
ciples of calculus as they apply to functions which are arbitrary except insofar 
as they are restricted by specified hypotheses. 

We shall in due course have to deal with functions of more than one 
variable. The general notion of a function is still that of a correspondence. A real 
function F of two real variables x, y is a correspondence which assigns a 
number F(jc, y) as the value of the function corresponding to the pair of values 
x, y of the two independent variables. The use of functional notation and the 
designation of the function by the single letter F require no detailed comment, 
since the basic ideas are no different from those already explained. 

The characteristic feature of calculus is its use of limiting processes. 
Differentiation and integration involve certain notions of passage to a limit. A 
fuller discussion of ideas about limits is presented later on in this chapter 
(§§1.6-1.64). Here we wish to touch on only one limit notion, that of the limit of 
a real function of one real variable. This notion is fundamental in the definition of 
a derivative. 

Suppose / is a function which is defined for all values of x near the fixed 
value x 0 , and possibly, though not necessarily, at x 0 as well. We wish to attach a 
clear meaning to the statement: /(x) approaches A (or tends to the limit A) as x 
approaches x 0 . The symbolic form of the statement is 

lim f(x) = A. (1.1-1) 

X^Xq 

The symbol A is understood to stand for some particular real number. The arrow 
is used as a symbol for the word “approaches.” Sometimes (1.1-1) is expressed 
in the form /(x)-> A as x->x 0 . Here are three typical examples of statements of 
this kind: (a) x 3 ->8 as x^2, (b) (x-l) 1/2 ->3 as x^lO, (c) logi 0 x^-2 as 
x -> 100. 

Definition . The assertion (1.1-1) means that we can insure that the absolute value 
|/(x) — A| is as small as we please merely by requiring that the absolute value |x - x 0 | 
be sufficiently small, and different from zero. This verbal statement is expressible 
in terms of inequalities as follows: Suppose e is any positive number . Then there 
is some positive number 8 such that 

|/(x) — A\<e if 0<|x-x 0 |<8. (1.1-2) 

Note that 0 < |x - x 0 | is the same as x^ x 0 . Note also that |/(x) - A| < e is the 
same as A - e < f(x ) < A + €, and |x - x 0 | < 8 the same as x 0 - 8 < x < x 0 + 8. 

We can give a geometrical portrayal of the inequalities (1.1-2). Let the points 
(x, y) with y =/(x) be located on a rectangular co-ordinate system; also locate 
the point (x 0 , A). For any €>0 draw the two horizontal lines y = A ± e. Now 
(1.1-1) means that, by choosing 8 small enough, those points of the graph of 
y = /(x) which lie between the two vertical lines x = x 0 ± 8 and not on the line 
x = x 0 will also lie between the horizontal lines y = A ± e. Fig. 1 shows a 
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specimen of this situation. The diagram also shows 
how 8 may have to be made smaller as e becomes 
smaller. 

It is to be emphasized that (1.1-1) places no 
restrictions whatever on the value of / at jc 0 , in case 
it is defined at that point. 

Appreciation of the formal definition of the 
meaning of (1.1-1) takes time and experience. The 
formal definition is the basis for exact reasoning on 
matters involving the limit concept. But it is also 
quite important to develop an intuitive understanding of the notion of a limit. This 
may be done by considering a large number of illustrative examples and by 
observing the way in which the limit concept is used in the development of calculus. 
One needs to learn by example how a function /(x) may fail to approach a limit as x 
approaches x 0 . 

The variable x may approach x 0 from either of two sides. Let us use x -> x 0 + 
to indicate that x approaches x 0 from the right, and x -» x 0 — to indicate approach 
from the left. The conditions for lim JtH >jt 0 /(x) = A are then that /(x)->A as 
x->x 0 + and also /(x)-»A as x-»x 0 -. Jn terms of inequalities the meaning of 
/(x)->A as x->x 0 + is this: to any e>0 corresponds some 8> 0 such that 
|/(x)-A|<e if Xo<x<x 0 +So- The meaning of /(x)-»A as x->x 0 — may be 
expressed in a similar way. 

Example 5. The limit of /(x) asx^x 0 may fail to exist because: 

(a) The limits from right and left exist but are not equal. This is the case 
with 

/(*>= 1+^1, 

where /(x)-> 2 as x ->0 + and /(x)->0 as x ->0-. 

( b ) The values of /(x) may get larger and larger (tend to infinity) as x-*x 0 
from one side or the other, or from both sides. This is the case with /(x) = 1/x as 
x —> 0. 

(c) The values of /(x) may oscillate infinitely often, approaching no limit. 
This is the case with /(x) = sin(l/x) which oscillates infinitely often between -1 
and +1 as x->0 from either side. 

The graphs of the three foregoing functions are shown in Figs. 2a, 2b, and 2c, 
respectively. 

Example 6 . If f(x) = e~ ,/x \ then lim x ^ 0 f(x) = 0. To “see” the correctness of 
this result, one must have clearly in mind the nature of the exponential function. 
When x is near zero, -1/x 2 is large and negative; now e raised to a large negative 
power is a small positive number. Hence e~ 1/x2 is nearly 0 when x is nearly 0, and 
/(x)^0 as x^O. This is an example of a rough intuitive argument leading to a 
conclusion about a certain limit. 

It is instructive to see how the intuitive argument is made precise by 
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reference to the definition of a limit. The statement lim^o e 11x2 = 0 means that to 
each e > 0 corresponds some 8 such that 

|*r ,/x2 -0|<e if 0<|jc-0|<8. (1.1-3) 

Let us see how we may find a suitable 8 when e is given. In doing this we take 
for granted the properties of the exponential and logarithmic functions. 

Since e to any power is positive, \e~ ]lxl - 0| < e is equivalent to e~ 11 * 2 < e. We 
rewrite this inequality in several successive equivalent forms: 
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Let us suppose that e < 1, so that log(l/e) > 0. Then further equivalent forms are 

l r i - 11/2 

0<X 2 <j 77TT’ 0< |x| < , 777 ^ . 

log(l/e) 1 1 Llog(l/e)J 

It now appears that, if 0 < e < 1, we can choose 

-Day’ 

and then (1.1-3) will hold, as required. 

Even in very obvious situations it is worth while to practice finding a 8 
corresponding to a given e, just to drive home an appreciation of the meaning of 
the definition of a limit. 

Example 7. Given e > 0, find 8 so that \f(x ) — 4| < e if 0 < |x — 2| < 8, where 
f(x ) = x 3 - x 2 . This will show that lim ^ 2 (* 3 ~ * 2 ) — 4. We have 

x 3 — x 2 — 4 — (x -2)(x 2 + x + 2). 

To begin with, let us consider only values of x such that \x - 2| < 1 , or 1 < x < 3. 
For such x we certainly have 4 < x 2 + x + 2 < 14, and hence 

|x 3 — x 2 — 4| ^ 14 |x — 2|. 

Now we see that |x 3 - x 2 - 4| < e provided 14 |x - 2| < e, or |x - 2| < e/14. Hence 
we choose for 8 any positive number such that both 8^1 and 8 ^ e/14. This 
choice meets the requirements. 

Reasoning with limits is facilitated by various simple theorems. Among the 
most important such theorems are the following rules, which we state here 
informally: 

Suppose that 


lim /(x) = A and lim g(x) = B ; then 

X^Xq X^Xq 


lim [f(x) + g(x)] = A + B, 

(1.1-4) 

X~*Xq 


lim [/(x)g(x)] = AB, 

(11-5) 

X~*Xq 


lim ^ \ = D , provided B^O. 
X ^ c g(x) B’ y 

(1.1-6) 


Formal proofs of the validity of these three rules are made in §1.64. Meanwhile 
we accept them and use them. 

Closely related to the limit concept is the concept of continuity. 

Definition . Suppose the function f is defined at x 0 and for all values of x near x 0 . 
Then the function is said to he continuous at x 0 provided that 

lim/(x) = /(x 0 ). 


(1.1-7) 
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Most of the functions which we deal with in calculus are continuous; points 
of discontinuity are exceptional, but may occur. A function may fail to be 
continuous at x 0 either because f(x) does not approach any limit at all as x ->x 0 , 
or because it approaches a limit which is different from /(x 0 ). 

Example 8 . The function /( x) = [x] (see Example 2) is discontinuous at x 0 if 
x 0 is an integer, but is continuous at x 0 if x 0 is not an integer. 

We observe in this case that lim^/OO does not exist, for when x is near 2, 
/(*)= 1 if x < 2 and /(x) = 2 if x > 2. The situation is similar at other integers. 


y 



The graph of y = /(x) is shown in Fig. 3. At the breaks in the graph when x is an 
integer n, the value of f(n) is indicated by a heavy dot 

Example 9. Suppose we define a function by 

f(x) = [x] + [2x — x] — 1. 

Direct inspection shows the following: 


/( 0 ) = 

1, 



/ 00 = 

0 

if 

0 < x < 1 , 

/(1)= 

1, 



/(*)= 

0 

if 

1 < X < 2, 

/( 2 ) = 

1 . 





y 



ft? 

' ' 

i ' ! 

i 1 ! 

? T T 

i i 

-3 -2 -1 

0 12 3 


fix) = [x] + [2 — x] — 1 


Fig. 4. 
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Consequently, lim^/fx) - 0; but /( 1) = 1, and so / is not continuous at x = 1. 
The graph of y = /(x) is indicated in Fig. 4. From the definition it may be seen 
that f(n ) = 1 if n is an integer and /(x) = 0 if n < x < n + 1. 

Example 10. Let us define f(x) = (sin x)/x if xt* 0. This definition of /(x) has 
no meaning if x = 0, since division by 0 is undefined. However, let us make the 
additional definition /(0) = 1. With this definition, / is continuous at x = 0. For, as 
we learn in elementary calculus, 


sinx . /, i oa 

lim = 1. (1.1-8) 

x->0 X 

Since we have defined /( 0) = 1, (1.1-8) shows that lim^o/OO = /(0); therefore / 
is continuous at x = 0, by the definition. 

We have based the concept of continuity directly upon the concept of a 
limit. A condition for continuity of a function may be given directly in terms of 
inequalities, just as we defined a limit in terms of inequalities. Thus, if / is 
defined throughout some interval containing x 0 and all points near x 0 , / is 
continuous at x 0 if to each positive e corresponds some positive 8 such that 

|/(x)-/(x 0 )|<€ whenever |x-x 0 |<6. (1.1-9) 

This form of the condition for continuity is equivalent to the original definition. 

Many common words are used in mathematics in a specialized way. Usually 
the mathematical meaning of a word has some relation to the common meaning 
of the word; but mathematical meanings are precise, whereas common meanings 
are broad or variable. The adjective “continuous” is a word of this kind, with a 
restrictive and precise mathematical meaning. Experience shows that students 
tend to read more, in the way of preconceived notions about the meaning of the 
term, into the word “continuous” than is implied by the definition. In analytic 
geometry and calculus we become familiar with the graphs of many functions, 
and there is a tendency to associate the term “continuous function” with the 
picture of a smooth, unbroken curve. Now it is true that if / is continuous at 
each point of an interval, the corresponding part of the graph of y = f(x ) will be 
an unbroken curve. But it need not be smooth. Smoothness is related to 
differentiability; the more derivatives / has, the smoother is its graph. A function 
may be continuous without having a derivative. In that case the graph of y = /(x ) 
might be so crinkly, so devoid of smoothness, as to make correct visualization of 
it quite impossible. 


EXERCISES 

Where the square-bracket notation occurs in these exercises, [/(x)] denotes the 
algebraically largest integer which is ^ /(x) (see Example 2). 
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1. Find each of the limits indicated, using algebraic simplification and the rules (1 . 1-4)- 

(1.1-6). 

, , x 2 - 16 

(a) 

„10 i 

(b) lim 


(e) !^;fcbrs) ; 


(c) lim 


I x 
x" 


(f) lim 


(d) lim 

x-*0 


j x - 1 
(x + 2) 2 — 4 


(ri a positive integer); 


(g) lim 

x — *2 


(h) lim 


o x \(4+ x) 
x 3 + x 2 - 5x -2 


x 2 — 4 

108(x 2 +2x)(x+l)- 
(x 3 + l) 3 (x- 1) 


2. Find each of the following limits, using roughly quantitative arguments based on 
your knowledge of the various functions involved (somewhat as in the first paragraph of the 
discussion of Example 6). 

(a) liir^ 10 '* 2 ; 

x -*0 


(b) limcos(e 1/x2 ); 

x — »0 


(c) lim 


1 + cos x 


*->o l + (logx 2 ) 2 ’ 

(d) lim tan~’^tan 2 ^, where the inverse tangent has its principal value, i.e., v = tan -1 u 

7T IT 

means u = tan v and - — < v < y , 

(e) limlog^ 1 ^). 

x-*0 \ x / 


3. Draw the graphs for each of the following functions and then answer the questions: 

(a) /(x) = 2x - 1 if x % 1, /(x) - 6 - 5x if x>l. Is / continuous at x = 1 ? 

(b) fix) = (x 2 /2) - 2 if 0<x<2, fix) = 2- (8/x 2 ) if 2<x. Does Iinw/(x) exist? How 
should f(2) be defined to make / continuous at x = 2? 

(c) f(x) = [1-x 2 ]. Consider only -l^x^ 1. Does 1im*^o/(x) exist? Is / continuous at 
x = 0? 


(d) f(x ) = (x - l)[x]. Consider only 0 ^ x ^ 2. Is / continuous at x = 1 ? 

(e) f(x) = x/|x| (undefined if x = 0). Does lim x ^ 0 /(x) exist? 

4. In each of the following cases / is defined by the given formula only if x^ 0. How 
should /(0) be defined to make / continuous at x = 0? 

(a) /(x) = 5i^ c ; (b) f(x) = (c) /(x) = (x + ^~ 8 ; (d) fix) = KT 1 '* 2 . 

Exercises 5-7 form a natural unit. 

5. If fix) = cx, where c is a constant, show that lim^ Jc0 /(x) = /(x 0 ) by applying the 
definition of a limit as expressed by (1.1-2). If c? 6 0 what can you take 8 to be in terms of e and 
c? 

6. If c is constant and n is a positive integer, show that lim*_** 0 cx" = cxS. U se Exercise 
5, mathematical induction, and (1.1-5). 

7. By a polynomial in x we mean a function defined by an expression 

P(x) = a 0 x n + aix" _l + • • • + a„ 
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where the coefficients a 0 , fli, . . . , a„ are constants, and n is an integer ^0. If n = 0, P(x) is 
constantin value, and the degree of P(x) is said to be zero. If n ^ 1 and a 0 # 0, we say that the 
degree of polynomial is n. Prove that P(x) is continuous at every point x 0 . Use the result of 
Exercise 6. What other result about limits do you use? 

8. By a rational function of x we mean a function defined by an expression 


R00 = 


P(x) 

p(xy 


where p(x) and P(x) are polynomials (see Exercise 7). The function is defined except when 
P (x) = 0. Show that it is continuous at xo if it is defined there. U se definition (1 . 1-7) and state 
exactly what appeal you make to facts about limits stated in the text or established in 
previous exercises. 

9. If /(x) = sint (a) find /(~)> n = 1, 2, . . . ; (b) find n = 1, 5,9, ... ; 

(c) find f(—\ n = 3,7,11,.... (d) How does the derivative f'(x) behave as x->0? 
Mitt / 


10. (a) How does 2 1/x behave as x->0+? (b) as x -►()-? XO What can you say 
about lim — —pr ? 

x-*0 J+Z 


11. Graph each of the following functions: (a) |x|, (b) |x — 1|, (c) |x + 2|, 

(d) |x 3 |, (e) |1 — x 2 |. Do any of these functions have any points of discontinuity? 

12. Graph the function x - [x] and discuss its discontinuities. 

13. Which of the following functions is continuous at x = 0? (a) [x 2 + 2], (b) [4-x 2 ], 

(c) [x 2 - 1]. Graph each function when — 1 ^ x S 1. 

14. If/(x) = k + JC J~ [ ^ , (a) find /(l),/(—l),/(j),/(— 5 ), /(?),/(— i). (b) Withoutusing 

the square brackets, write expressions for /(x) if 0 <x <{ and if — | <x < 0. (c) What is 

lim x -o/(x)? 


15. If /(x) = (a) find f(i), f(l), /(I), /(I), /(f), /(!)■ (*») Express f(x) without 

absolute values if x > 1; if 0 <x < 1. (c) What can you say about lim^i /(x)? 

16. If/(x) = l 2 + *l~ M , (a) find/(l),/(-l),/(— 2),/(2). (b) Write an expression 

for /(x) without absolute values if 0<x; if -2 <x <0. (c) What can you say about 

lim x ^o/(x)? 

17. If f(x) = [7x 2 - 14], (a) find /( 0), /( 1), /( 2), /(J), /(I), /( i). (b) Is f continuous at 
x — 0? (c) Is it continuous at x - 1? (d) at x = V2? 

18. If /(x) = [sin x], (a) find/(0),/(7r/2),/(-W2),/(7r/4),/(-7r/4). (b) Does lim /(x) 

exist? (c) What does /(x) approach as x-»(7t/ 2)-? ( d ) as x^>0-? 

19. Prove that lim x ^i(x 2 + 2x) = 3 by finding 5 in terms of a given positive e so that 
|x 2 + 2x - 3| < e if |x — 1 1 < 5. 


20. Show that 


16 25 


- \x - 3| if 2 < x < 4. Hence, for any e > 0, find 5 so that 

juO 


< e if |x — 3| < 5, thus proving directly that lim ^ ~ 


1 1 

FTT6 25 

21. Show that |(1 + x 3 ) - 1| ^ 7|x| if -1 < x < 1. Then prove by the definition (1. 1-2) that 
lim*^ 0 (l + x) 3 = 1 (i.e., find a suitable 5 for any given positive e). 
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22. Show that ' ' -^ + 1 <eifO <|x| <5, where 8 is the smaller of the numbers 

x\2+x 2/4 ii’ 

l,4e (e being positive). Translate this situation into a statement of the form linw* 0 f(x) = A, 
specifying what you take for /, x 0 , and A. 

23. Show that, if 0 < e < 1 , l(T 1/x < e when 0 < x < (logio ^ . What is the correspond- 
ing statement about a limit? 

2 ,/x + 3 

24. Does lim . exist? 

x— 0 Z +1 

25. Suppose that a function f is defined by setting f(x) = 1/n if x = 1/2", where n = 1,2, 
3, . . . , and /(x) = 0 for all other values of x. Is / continuous at x = 0? 


1-11 / DERIVATIVES 

Elementary calculus deals with the processes of differentiation and integration, 
the techniques of these processes as they pertain to various common functions, 
and the applications of the processes to problems of geometry, physics, and 
other sciences. Let us examine the concept of the derivative. 

Consider a function f, defined for values of the variable x in the interval 
a <x <b. Let x 0 be any fixed point of the interval, and consider the ratio 


/(x)~/(Xq) 
X -x 0 ’ 


(1.11-1) 


where x^ x 0 and x is a variable point of the interval. The ratio (1.11-1) is called a 
difference quotient. 


Definition . If the difference quotient (1.11-1) approaches a limit as x approaches 
x 0 , the limit is called the derivative of f at x = x 0 , and is denoted by f'(xo). Thus, 
by definition, 

/-(X,) = lim > (1.11-2) 

provided the limit' exists. 


Quite likely the student is familiar with another notation in connection with 
this definition. Sometimes we write x = x 0 + Ax, and then (1.11-2) takes the form 


f'(xo) = lim 

Ax^O 


f (xq Ax ) f (x 0 ) 
Ax 


(Ml-3) 


Here the symbol Ax denotes an independent variable. For many algebraic 
calculations it is convenient to use h in place of Ax. Also, we may drop the zero 
subscript; we then have the definition 


f'(x ) = lim 

h-> 0 


/(x + h)-/(x) 
h 


(1.11-4) 


provided the limit exists. 
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In addition to the notation f'(x) for the derivative we frequently use the 
notation J^/(x). 

Example 1. Using form (1.11-2), calculate f(x 0 ) if /(x) = x 2 . Here 
fix ) - f(x 0 ) = X 2 - Xo = (x - Xo)(x + Xo); 


f(xo) = lim (x + Xo) = 2x 0 . 

x-+x 0 

Example 2. Using form (1.11-4), calculate f'(x) if f(x) = 1/x. Here 

1 1 -h 


f(x + h)-f(x) = 


x + h x (x + h)x’ 


f ^ 52 (x + h)x 


P 


Definition . A function which has a derivative at a certain point is said to be 
differentiable at that point. 


In the definiton (1.11-2) we were assuming that f was defined in an interval 
extending some distance on each side of the point x 0 . It is understood that x may 
approach x 0 from either side, and that the limit of the difference quotient is the 
same when x approaches x 0 from the left as when the approach is from the right. 

It is useful to define one-sided derivatives. Using the notation for limits from 
the right and left, respectively, as explained just prior to Example 5 in §1.1, we 
define the right-hand derivative /|(x 0 ) and the left-hand derivative /_(x 0 ) as 
follows: 

/!(*„) = lim (1.11-5) 

x-*xq+ X Xo 

f’(x 0 ) = lim (1.11-6) 

x->Xq- X Xo 

provided the limits exist. 

If in discussing a function we confine our attention wholly to an interval 
a^x^b, then we shall understand that f'(a) means /1(a), and that f'(b) means 
f'(b). If a < x 0 < b, however, and if the function is differentiable at x 0 , then we 
must have /{(x 0 ) = /-(x 0 ). The derivative /'Uo) is then the common value of the 
two one-sided derivatives. For an example of a case in which the two one-sided 
derivatives exist but are unequal, see Exercise 12. 

Example 3. Let f(x) = V2(l - cos 2x). Show that this function is not 
differentiable at the points x = 0 , ±tt, ±2tt, . . . and find the one-sided derivatives 
at these points. 

We recall the trigonometric identity 

1 - cos 2x = 2 sin 2 x. 
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/vw\ 


— 27T — 7T' 


Fig. 5. 


O 7T' x /27T 


Thus* 


f(x) = V4 sin 2 x = 2|sin x\. 


This means that f(x) = 2 sin x when sin x ^ 0; but f(x) = - 2 sin x when sin x < 0. 
The graph of f(x) is shown in Fig. 5. The dotted portions represent the function 
2 sin x when sin x < 0. From the symmetry of the figure we see that the situation 
at all the points x = nu (n = 0, ± 1, . . .) is the same. The right-hand derivative at 
each of these points is 2, and the left-hand derivative is -2: 


/+(0) = Urn 

x->0+ 


2 sin x - 0 
x-0 


= 2 , 


/:(0) = lim 

x->0- 


2 sin x 0 
x-0 


- 2 . 


We take it for granted that readers of this book are acquainted with the 
interpretation of the derivative f'(x) as the slope of the curve y = /(x) when we 
employ a graphical representation in rectangular co-ordinates with equal scales 
on the two axes (see Fig. 6). To say that / is differentiable at a point means 
geometrically that the curve y - /(x) has at that point a unique tangent line 
which is not parallel to the y-axis. 

We also take it for granted that the student is familiar with the interpretation 
of the derivative as an instantaneous rate of change. Without going into detail we 
emphasize the fact that the concepts of velocity, acceleration, and all kinds of 
instantaneous rates of change find their precise mathematical formulation in 
terms of the notion of the derivative of a function. 

Students beginning this book are expected to know the general rules of 
differentiation, including the rules for dealing with sums, products, and quo- 
tients. 


*In this book we adhere to the standard convention that if A ^0, VA means the nonnegative 
square root of A. According to this convention Va 2 = a if a_^ 0, but Va 2 = - a if a < 0. Both cases 
are covered by the formula Va 2 = \a\. Finally, A 112 and VA are merely different notations for the 
same thing. 
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It will be convenient to draw up a list of 
differentiation formulas 


GENERAL RULES 


^ [/(*) + g(x)] = /'(*) + g'(x). 

d 


dx 


[c/(x)] = cf’(x) (c constant). 


dx 


U(x)g(x)] = f(x)gXx) + f’(x)g(x). 



d \f(x)l g(x)f’(x)-f(x)g'(x) 
dx Lg(Jc)J [g(x )] 2 


(1.11-10) 


Each of these rules is in fact a theorem. We understand that the functions /( x), 
g(x) are defined on some interval a<x<b. Rule (1.11-7), when stated more 
fully as a theorem, may be expressed as follows: If f(x ) and g(x) are differenti- 
able for a particular value of x , then their sum is also differentiable for this value 
of x, and (1.11-7) holds. The student should amplify each of the rules (1.11-8)- 
(1.11-10) into a formally stated theorem in the same manner. What special 
provision must be made in connection with (1.11-10)7 

We mentioned at the end of § 1.1 that a function may be continuous and yet 
not differentiable. For instance, the function of Example 3 is continuous for all 
values of x, but it is not differentiable at the points ni r, n = 0, ±1, ±2,.... 
However, differentiability does imply continuity as the following theorem 
shows. 


1 1THEOlfl6S?PB If f is differentiable at jc 0 , it is continuous there. 

When we can write 

fix) = (x - Xo) + fix o). 

x — Xo 

Then, by (1.1-4) and (1.1-5), 


lim f(x) = /'(*o) ■ 0 + f(x 0 ) = f(x 0 ). 

X-Mq 


This completes the proof. 


The student must already be accustomed to using the rule for differentiating 
a composite function (sometimes called the chain rule). By a composite function 
we mean a function formed by substitution of one function in place of the 
independent variable in another function: 

F(t) = f[g(t)l 


( 1 . 11 - 11 ) 
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To fix the ideas precisely, suppose that g is defined when a < t < /3, and that the 
values of the function satisfy the inequality a<g(t)<b. Suppose that / is 
defined when a<x<b. Then, replacing x by g(t ), we obtain the composite 
function F defined by (1.11—11). 

THEOREM II. Suppose g is differentiable at a point t 0 of the interval a <t < j 3. 

Let x 0 = g(fo), and suppose that f is differentiable at x 0 - Then the composite 

function F is differentiable at f 0 , and 

F r (to) = f'(x 0 )g'(t 0 ). (1.11-12) 

In elementary calculus this theorem is often expressed symbolically in a 
different way, by writing 

y = f(x),x = g(t). 

Then 

dy _ dy dx 
dt dx dt ‘ 

Example 4 . Suppose /(x) = x 17 , g(f) = t — t 2 . Then 

F(t) = (t - t 2 ) 17 and F'(t) = 17 (t - t 2 ) l6 (l - 2 1). 

We accept Theorem II as known from elementary calculus. The proof is a 
somewhat delicate matter, however, and the student who wishes to study the 
proof will find a discussion of it in Exercise 26 at the end of this section. 

Throughout calculus there are two aspects of the development of the 
subject. On the one hand we formulate concepts and rules applicable to arbitrary 
functions having certain properties. Theorems I and II are of this type. On the 
other hand there are the particular functions which we deal with as illustrations 
and in all practical applications, e.g., 

(1 - x 2 ) 1/2 , sin 2x, tan -1 x, log x, e ~ xli 2 , 

and many others. We assume that the student knows the formulas for differen- 
tiation of the standard elementary functions, and in general we shall regard all 
such functions as being available for illustrative purposes. 

In order to illustrate the possibility of various kinds of situations which do 
not ordinarily arise with the standard elementary functions, we sometimes resort 
to the contrivance of functions specifically defined so as to exhibit some 
peculiarity. Such specially contrived functions serve to help the student ap- 
preciate the generality of the concept of a function. They also teach him to be 
wary of tacitly assuming more than is implied in a given definition or hypothesis. 

Example 5. Let a function be defined as follows: 

f(x) = x 2 sin^ if Xt* 0, 


/( 0 ).= 0 . 
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We shall see that this function is differentiable for all values of x, but that the 
calculation of its derivative at x = 0 requires special attention. 

We note first of all that the formula by which /(x) is defined when x ^ 0 
cannot be used when x = 0, since 1/x is then undefined. Hence the assignment of 
the value of /(0) may be made as we choose. The value /(0) = 0 is chosen 
because this makes the function continuous at x = 0; that is, 

limx 2 sin- = 0. (1.11-13) 

x->0 X! 


The correctness of this result is seen from the inequality 


x~ sin — 
x 


^x 2 , 


(1.11-14) 


which holds since the value of the sine function never exceeds unity. 
To find the derivative, we follow standard procedures in writing 

f(x) = x 2 cos ^ + 2x sin ^ 


/'(x) = - cos — + 2x sin — 

J v/ X X 


(1.11-15) 


This result is correct when x ^ 0. If x = 0, however, the foregoing procedure for 
finding f'(x) is not valid, for it is based on the rule for the derivative of the 
product of two functions, namely x 2 and sin(l/x); the second of these functions 
is not defined at x = 0, and cannot be defined there so as to be differentiable. 

As yet, then, we do not know whether /(x) is differentiable at x = 0. Now, by 
definition, 


/'( 0) = lim 

x-»0 


/(x)-/(0) 

x - 0 ’ 


provided the limit exists. Since /(0) = 0, 

m^m = iM =xsin i, 

x — 0 x x’ 

and we see that 

/'(0) = lim x sin - = 0. (1.11-16) 

x->0 X 

It is worth pointing out that f'(x) is not continuous at x = 0; for from 
(1.11-15) we see that as x-»0, f'(x ) approaches no limit but oscillates infinitely 
often from -1 to +1. 

A graph is helpful in visualizing the nature of the function /. The student 
should construct such a graph, using the method of multiplication of ordinates. 
The curve v = /(x) oscillates between the curves y = x 2 , y = - x 2 , crossing the 

axes at the points ± — , ± J~, ± 4~, 

77 Z77 3 77 
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In concluding this section we point out a certain principle of reasoning about 
limits which we used in arriving at (1.11-13) and (1.11-16). It is the following: 
If two functions , F, G satisfy an inequality of the form 

A^F(x)^G(x), (1.11-17) 

where A is some fixed number , and if lim*^ G(x) = A, then lim^^/Oc) = A also . 

For example, in applying this principle to arrive at (1.11-16) we put F(x) = 
\x sin(l/x)|, G(x) = |jc|, A = 0, x Q = 0. The principle just stated is a special case of 
Theorem XII, §1.61. Its truth is an immediate consequence of the definition of a 
limit. 


EXERCISES 

1. (a) If fix ) = x", where n is a positive integer, compute /'(x 0 ), using (1.11-2) and a 
factorization theorem, (b) Compute f'(x ), using (1.11-4) and the binomial theorem. 

2. Suppose p and q are positive integers without a common factor. Let fix) = x plq . 
Suppose x, x 0 are positive, and write u = x llQ , u 0 ~ Xo Q . Verify that 

/(x)~/(x o) = U P ~U% 

X-JCo u q — Uo 

Proceed from here to show directly that /'(x 0 ) = nxS~\ where n = plq. 

3. Let /(x) = x c , where x >0 and c is irrational. Since c cannot be expressed as a 
ratio of integers, the method of Exercise 2 is not available for calculating /'(*). However, 
assuming as known the differentiability properties of the exponential and logarithmic 
functions, show that the formula /'(*) = cx c “‘ may be deduced from the fact that 

4. Let /(x) = sinx, g(x) = cosx. Show that finding /'( 0) is the same as finding 

sin x cos x — 1 

lim , and that finding g'(fi) is the same as finding lim . Taking for granted that 

x— >0 X X-+0 X 

these limits have values 1, 0 respectively, deduce the formula /'(*) = cos x, rising (1.11-4) 
and the expansion formula for sin(x + h ). 

5. Show that the formula g'(x) = — sinx may be derived from the relations /'(x) = 
cos x, cos x = sin^- x^. (/ and g are defined in Exercise 4). What theorem do you use? 

6. Let f(x) = log x. Using only the definition of the derivative and standard proper- 
ties of the logarithm function, explain why /'(l) = lim^logO + h) Uh . 

7. If /(x) = log x, show that 

/(x + h)-/(x) _ 1 /(1 + 0-/(1) 
h x t 

where t = h/x. Hence explain why / is differentiable at x if it is differentiable at 1, and 
show that f(x) = ~~- 

8. Let f{x) = e x . Explain why /'( 0) = lim -. Show that, if /'(0) = 1 is known, one 

x-*0 X 

can deduce f(x) = e x with the aid of the laws of exponents. Start from (1.11-4). 

9. The radius of a sphere is being increased at a variable rate. This rate is 2 
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centimeters per second when the radius is 5 centimeters. Find the rate of change of the 
volume of the sphere, in the cubic centimeters per second, at this particular moment. 

10. A rocket is being launched straight upward from the earth. It burns liquid fuel at 
a variable rate, the rate being N gallons per mile when the rocket is 10 miles high. If the 
speed of the rocket at this time is 1000 miles per hour, what is the instantaneous rate of 
fuel consumption in gallons per hour? Let x be the altitude of the rocket t hours after 
launching. Suppose the rate of fuel consumption is kx~ 112 gallons per mile and 3 k(ct) l/2 
gallons per hour where c and k are positive constants. Find a formula for the rate of rise 
of the rocket, and deduce the formula connecting x and t. 

11. If / is the function of Example 5 and F(t) = f(t 2 - 1), find F'(l). 

12. Show that f(x) = |x| is not differentiable at x = 0. Is it continuous? Find /L(0) and 

/'( 0 ). 

13. Let f(x) = e~ ,xl . (a) Graph this function, (b) Is it continuous at x = 0? (c) Is it 

differentiable there? 

14. Let /(x) = x|x|. (a) Graph the function, (b) Find f'(x) if x>0; if x = 0; if 

x <0. (c) Is the derivative /' differentiable at x = 0? 

15. (a) If f(x) = [x], compute /'(I) by (1. 1 1-4). (b) How does Theorem I show that / 

is not differentiable at x = 2? (c) What is the value of /i(2)? (d) How does 

/(2 + h) — f(2) L , n o 

— r — behave as h-*0— ? 

h 

16. Discuss the continuity and differentiability at x = 0 of /, where f(x) ~ x sin(l/x) 
when x 7^ 0, /(0) = 0. 

17. Show that the function defined as f(x) = x 3 sin(l/x) if xt* 0, /(0) = 0, has a 
derivative for all values of x, and that /' is continuous at x = 0 but not differentiable there. 

18. (a) For what values of the exponent n (an integer) will f(x) exist at x = 0 if 

/(x) = x n sin(l/x 2 ) when x^0, and /(0) = 0? (b) For what values of n will /' be con- 
tinuous at x = 0? (c) For what values of n will /' be differentiable at x = 0? 

19. Let /(x) = if x^ 0, /( 0) = 0. Does /'( 0) exist? Sketch the graph near x = 0, 

showing the directions from which a point approaches the origin along the curve. 

20. Discuss the differentiability at x = 1 of f(x) = (x - l)[x]. Draw the graph when 
0 ^ x ^ 2. 


21. If f(x) = [x] + (x - [x]) 1/2 , sketch the graph when 0^x^3. What can you say 
about continuity and differentiability of / at x = 1 and x = 2? Write a formula for /(x) 
without square brackets when 0 <x < 1, and use it to find f(\). 

22. Let / be a function which is defined for all x, with the properties (i) f(a + b) = 
f(a)f(b), (ii) /(0) = 1, (iii) / is differentiable at x = 0. Show that / is differentiable for all 
values of x, and that /'(x) = /'(0)/(x). 

23. Let functions / and g be defined for all x and possess the following properties: 
(i) f(x + y) = /(x)g(y) + /(y)g(x), (ii) / and g are differentiable at x = 0, with /(0) = 0, 
/'(0) = 1, g(0) = 1, g'(0) = 0. Show that / is differentiable for all values of x, with /'(x) = g(x). 
If it is also known that g(x + y) = g(x)g(y) - /(x)/(y), show that g is differentiable for all 
values of x, with g'(x) = -/(x). 


24. Suppose /(^r) = 2^ 
differentiable at x = 0? What 


, n = 1,2, . . . , and /(x) = 0 for all other values of x. Is 
is the situation if instead we define 


/ 
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25. Construct proofs for rules (1.11 — 7)— ( 1.11—10), using (1.11-4). 

26. Given /, g, and F as in (1.11-11) and Theorem II, consider any nonzero value of 
A t so small that a < I 0 + Af < /3, and define 

Ax = g(t 0 + At) - g(t 0 ), Ay = f(x o + Ax) - /(x 0 ). 

Then define 

e = ^ - f'(xo) if Ax ^ 0 and e = 0 if Ax = 0. 

Show that 


F{tg+ AQ-F(fo) _ Ay _ , , , Ax 

A t “ AF _[/(Xo) + e] A7- 

Explain carefully why Ax and e approach 0 as At -^-0, and then explain how you can carry 
through the proof of Theorem II. 


1.12 / MAXIMA AND MINIMA 

One of the important things about the derivative is that it helps us to locate the 
relative maxima and minima of a function. Let us formulate exactly what can be 
said about such things. 

It is necessary to say what we mean by an open interval of the x-axis. If a 
and b are numbers such that a < b, all numbers x such that a < x < b form what 
is called the open interval from a to b, or more briefly, the open interval (a, b). 
The end points a, b do not belong to the open interval. By contrast, the set of all 
numbers x such that a^x^b is called a closed interval ; here the end points 
belong to the interval. We shall denote closed intervals by the use of square 
brackets: [a, b]; for open intervals we shall use ordinary parentheses: (a, b). By 
a neighborhood of x 0 we mean an open interval containing x 0 . 

^Definition, Let f be a function which is defined on an 
open interval (a, b), and let x 0 be a point of the interval. 

We say that f has a relative maximum at x 0 if there is some 
neighborhood of x 0 (say (ai, bi), where ai<x 0 <fc>i) 
contained in (a, b) and containing x 0 such that f(x)^ 
f(x o) if a\ < x < b\. This means that /(x 0 ) is at least as 
large ( algebraically ) as f(x) at all points x for some 
distance on either side of x 0 (see Fig. 7). Fig. 7. A relative maximum. 

A similar definition is made for a relative minimum, the inequality being 
reversed: /(x)^/(x 0 ) when x is near x 0 . 

Both a relative maximum and a relative minimum are covered by the term “a 
relative extremum .” 

THEOREM III. Suppose that f has a relative extremum at the point x 0 of the 
open interval (a, b) and suppose that f is differentiable at x 0 . Then f'(x 0 ) = 0. 
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I Proof. For definiteness assume that / has a relative maximum at x 0 . Con- 
sider the one-sided derivatives at x 0 , and bear in mind that f(x) ^ /(x 0 ) when x is 
sufficiently near x 0 . Then /(x) - /(x 0 ) ^ 0: accordingly 

/M-/fa ) g0 when x> 

X-Xq 

and so (see (1.11-5)) 


/l(xo)^0. 


Likewise 


MiiW 

x-x 0 


^ 0 when x < x 0 , 


so that fL(x 0 )^0 (see (LI 1-6)). But, since / is differentiable at x 0 , we have 
f'(x 0 ) = / j(x 0 ) = f-(x 0 ). Therefore f'(x 0 ) = 0, for it is neither positive nor negative. 
The proof for the case of a relative minimum is entirely similar. 


It may happen, of course, that a function has a relative extremum at a point 
x 0 , but is not differentiable there. This happens with /(x) = 1 - x 2/3 at x = 0, and 
with /(x) = |x — 1| at x = 1. See Fig. 8, also. 

The proof of Theorem III rests on the following reasoning about limits: If a 
variable quantity is ^0 and approaches the limit A , then A^0; likewise , if the 
variable quantity is ^0 and approaches the limit B, then B ^ 0. This principle is 
considered further in §1.61 (Theorem X). 

In defining a relative extremum we compared the value /(x 0 ) with values f(x) 
at points x on both sides of x 0 . Theorem III applies only when x 0 is an interior 
point of the interval on which we are examining the values of /. By contrast, let 
us consider a function which is defined only when a ^ x ^ b, and suppose that 
f(a) is greater than f(x) when x is near a on the right. Then, if the right-hand 
derivative f[(a) exists, we can infer that f[(a)^ 0, but not that /|(a) = 0 (see 
Fig. 9). We leave it for the student to draw the appropriate conclusion about 
f~(b) if f(x)^f{b) when x is near b on the left. 

Of course, the mere fact that /'(* 0 ) = 0 does not guarantee that / has a 
relative extremum at x 0 . This is illustrated by f(x) = x 3 at x = 0, where the graph 
has a horizontal tangent but the function has neither a relative maximum nor a 



Fig. 8. A relative minimum. 



Fig. 9. 
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relative minimum. If / is differentiable at jc 0 , /'(* o) = 0 is a necessary , but not a 
sufficient , condition for / to have a relative extremum at x 0 . 

There are tests which are sufficient, but not necessary, for a relative 
extremum at x 0 , and which at the same time provide a means of distinguishing a 
relative maximum from a relative minimum. We postpone consideration of such 
tests to later sections. See, for instance. Example 8, §1.2. 

In many problems we are interested in a function which is defined over some 
given interval, and we wish to find the largest (or smallest) value which the 
function assumes on the given interval. The interval may be closed, e.g., 
0^x^4, or open, e.g., 1 < jc <3, or neither. Examples of intervals which are 
neither open nor closed are furnished by inequalities such as 0 < x ^ 10 and 
2 ^ x < 8 (open at one end and closed at the other). Also, we may be interested 
in finding the greatest or least value of f(x) for an infinite range of values of x. 
For example, let x denote the altitude of a right circular cone circumscribed 
about a sphere of radius b, and let f(x) denote the volume of the cone. One finds 
without much trouble that 



The significant values of x are those for which x > 2b, since if x ^ 2b there can 
be no cone of altitude x circumscribed about the sphere. For further con- 
sideration of this problem see Exercise 10, page 25. 

There may or may not be an absolute maximum or an absolute minimum in a 
given situation. This will depend on the nature of the function and the interval 
which are involved. 

Example 1. f(x) = x; interval 0< x ^ 10. 

Here there is no absolute minimum, since /(x) can be as near 0 as we please, 
but never attains that value on the specified interval. There is an absolute 
maximum, occurring at x = 10. 

Example 2. f(x) - x 2 ; interval - 1 ^ x < 2. 

Here there is an absolute minimum at x = 0; there is no absolute maximum 
on the interval, for f(x) can approach but never reach the value 4 when x is 
restricted to the specified interval. 

Example 3. f(x) = tan x ; interval ~7t/2 < x < tt/2. 

Here there is neither absolute maximum nor absolute minimum. 

There is a very important theorem to the following effect: 

If f is defined and continuous at each point of the finite closed interval a ^ x ^ b, 
then at some point of the interval f(x) attains an absolute maximum value. 
Likewise , at some point of the interval f(x) attains an absolute minimum value. 

This theorem is taken up carefully and proved in §3.2. For the present we shall 
accept the theorem and strive to appreciate its usefulness. The requirement that 
the interval be finite and closed is quite essential, for in the absence of these 
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limitations f(x) might not attain any absolute extreme values, as we see by 
Examples 1-3. 

In practice the functions we are interested in are usually differentiable at all 
points of the interval (there may sometimes be isolated exceptional points). If an 
absolute extreme value occurs at an interior point of the interval, it is also a 
relative extreme value in the sense of Theorem III, and therefore we must have 
/'OO = 0 at the point, provided / is differentiable. We therefore have the 
following guiding principle in searching for points at which f(x) can attain an 
absolute maximum or minimum value: Suppose f is differentiable on the given 
interval , except perhaps at a finite number of points, and suppose it is known that 
an absolute maximum (or minimum) value is actually attained. Then the point of 
attainment is either 


(a) a point where f'(x) = 0, 

(b) a point at one end of the interval, 

or (c) a point where f is not differentiable. 

In the common type of problem studied in elementary calculus, the solution is 
usually found under (a). In fact, it usually happens with physical or geometrical 
problems that there is only one interior point of the given interval where 
f'(x) = 0. Solutions under (b) do occur sometimes, even in physical problems, 
and a carefully reasoned solution should always take account of the situation at 
the ends of the interval, perhaps even before computing /'(jc). The situation (c) 
may occur also, but this will be more rare in common practice. 

2 8 

Example 4. Find a number x between 0 and 1 such that f(x) = — + _ ^ is as 
small as possible. 

We observe that /(jc)>0 when 0 < jc < 1 ; / is continuous in the specified 
open interval. Also, /(jc) becomes very large (in fact /(*)-» + ») as x approaches 
either end of the interval. We conclude that the graph of y = f(x ) near the ends of 
the interval has an appearance somewhat as shown in Fig. 10. It follows from 
this reasoning that if we choose a closed interval a ^ jc ^ b, with a > 0 and very 
near 0, and b < 1 and very near 1, the function / will have smaller values in the 
interior of the interval [a, b] than it has in the rest of the interval (0, 1). Since / is 
continuous on the finite closed interval [a, b], it must attain a value at some 
point of [a, b] which is an absolute minimum among all the values occurring on 
the interval. This absolute minimum will also be an 
absolute minimum among all the values of / occurring on 
the open interval (0, 1). Now / is differentiable in (0, 1); 
hence the required point of absolute minimum must be a 
point at which /'( jc) = 0. We therefore proceed to com- 
pute the derivative and solve the equation f'(x) = 0: 

V( A _ -2 L 8 6jc 2 + 4x -2 

/ U) - JC 2 (i-x) 2 x 2 (\ — x) 2 ’ 

3jc 2 + 2x-1 = 0,jc = 3 or x = - 1. 
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The solution in the interval (0, 1) is x = i We conclude that /(x) attains its 
minimum value at x = ], the minimum value is 18. Observe that no test, by 
second derivatives or otherwise, is necessary to distinguish between a maximum 
and a minimum in this case, since we know that a minimum value exists, and 
there is only one point in the interval at which f'(x ) = 0. 

We emphasize that the assurance of the existence of an absolute minimum in 
Example 4 is based on use of the theorem cited following Example 3 on page 22. 
Likewise, in Example 5 (following), the existence of an absolute maximum is 
assured by the same theorem. 

Example 5. If f(x) = x(4- x){(x - 2) 2 + 2}, find the absolute maximum value 
of f(x) when 0 ^ x ^ 4. 

Here we see that f is differentiable (and continuous) for all values of x. 
Moreover, /(0) = f(4) = 0, and f(x) >0 if 0 < x < 4. There must be an absolute 
maximum for / somewhere on the closed interval [0, 4], and it clearly does not 
occur at either end of the interval. Hence it must occur at some interior point 
where f’(x ) = 0. A simple calculation shows that 

f(x) = - x 4 + 8x 3 - 22x 2 + 24x, 
f'(x) = - 4x 3 + 24x 2 - 44x + 24, 
f'(x) = - 4(x - l)(x - 2)(x - 3). 

There are three points where f(x) = 0: x = 1, 2, and 3. Calculation shows that 

/(l) = 9, / (2) = 8,/(3) = 9. 

Hence the absolute maximum value of / on [0, 4] is 9, occurring at x = 1 and 
x = 3. All of this shows up clearly on a graph, which the student should 
construct for himself. It is noteworthy, however, that the reasoning is conclusive 
without the graph. 


EXERCISES 

1. (a) Find all the points of relative maxima and minima and sketch the graph of 

/(x) = (x + 5) 2 (x 3 - 10). (b) Find the absolute maximum and minimum values of /(x) on 

the interval -6^xi-2. 

2. Find the algebraically largest and smallest values of /(x) = 20x 3 - 2700x + 7000 
when 0^x ^ 10. 

3. Find the absolute maximum of /(x) = (x 2 -75)/(x - 10) for 0^x< 10. Begin by 
sketching the graph enough to show why a maximum must be attained in this interval. 

4. Consider /(x) = xl(x 2 + a 2 ) 312 for x ^ 0. Explain why /(x) must attain an absolute 
maximum value at some point. Find the point 

5. Find the absolute maximum of sin 2 20(1 + cos 20) for 0 ^ 0 tt/2. 

6. Discuss the possible absolute extrema of f(x) = x 4 + (256/x 2 ) for 0<x^4, and 
find any that exist. 

7. Consider the function (27/sin x) + (64/cos x), 0<x <tt(2. Why must there be an 
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absolute minimum in this open interval? Find where it occurs, and the corresponding 
value of the function. 

8. Without drawing the graph, find the absolute maximum and minimum values of 
the function (a) 5 cos 3 x - 3 cos x; (b) 2 sin x - 1 + 2 cos 2 x. 

9. A right circular cone of altitude x is inscribed in a sphere of radius c. Express the 
volume of the cone as a function of x. What open interval of values of x is of significance 
in considering nondegenerate cones? Explain why there must be an absolute maximum 
volume for some x in this interval, and find the x for which the maximumjs attained. 

10. Let V be the volume of a right circular cone of altitude x circumscribed about a 
sphere of radius b. Show that V attains its absolute minimum when x = 4b. Write the 
argument out fully and carefully after the manner of the discussion of Example 4. 

11. A right circular cylinder with radius of base x is inscribed in a right circular cone 

with radius of base r and altitude h. (a) Express the total surface area of the cylinder 
(including ends) as a function of x . (b) What interval of x -values are of significance if a 

nondegenerate cylinder is wanted? (c) What inequality must be satisfied by r and h if 
there is to be a nondegenerate cylinder of maximum area? (d) What is the situation if 
h = 6, r = 2? if h = 4, r = 2? if h = 4, r = 3? 

12. Consider /(x) = if 0 < x < 1. (a) Complete the definition of / at x = 0, 1 

so as to make / continuous on [0, 1]. (b) Find the absolute maximum and minimum of 

/(x) on [0, 1] after completing (a), (c) Sketch the graph of y = /(x). 

13. L et /(x) = = 5Vl6 + x 2 + 4V(3-x) 2 (taking the positive square root in both cases, 

so that V(3 - x) 2 = |3 - x|). Note that / is differentiable except when x=3 and that 
/(x)— > + °° as x — > + °° or x— > — (a) How do you infer from this that / must attain an 
absolute minimum value? (b) Find formulas for //(x) according as x <3 or x >3, and 
show that ]’(x)<0 if x<3, while /'(x)>0 if x > 3. (c) What do you infer about the 

point of attainment of the minimum? (d) What is the minimum value of /(x)? 

14. A spring is located at (0, a), and a man’s house is located at (b, 0), where 
a >0, b >0. A pipeline is to be laid in two straight parts, the first part from the spring to 
(x, 0), and the second part from (x, 0) to the house. The two parts will cost C\ and c 2 dollars 
per unit length, respectively. 

(a) Show that the total cost of the pipeline, for any value of x, is 

/(x) = c iV a 2 + x 2 + c 2 |b — x|. 

(b) Show that / hasTts absolute minimum value for some x such that 0<x ^ b. 

HINT: Consider the sign of f'(x) when x SO and x > b respectively. 

(c) Find the inequalities which must be satisfied by c i, c 2 , a, and b if / is to attain its 
minimum for an x such that 0 < x < b. 

Observe that Exercise 13 is a special case of this exercise, A contrasting special case is 
afforded by taking c v = 5, c 2 = 4, a = 3, b = 5. If one does these two special cases first, the 
general problem will be more interesting. 

15. Discuss the intervals of definition of the function 

/(x) = {(16-x 2 )(x 2 - 9)} 1 ' 2 , 
and find the absolute maximum of the function. 

16. A man wishes to get from point A to point B, these points being diametrically 
opposite each other on the shores of a circular pond. The man can row \\ miles per hour 
and walk 5 miles per hour. 
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(a) If there is a boat available at A, what combination of rowing and walking will take 
him to B in the least possible time? 

(b) Discuss the problem in case the rowing and walking speeds are, respectively, u and v 
miles per hour. Check your results carefully jn the special case u = 2, v = 4. 

17. Consider a and b as fixed, with b < a. Let c be a variable such that b <c < a. 
Let (j> be the acute angle between the tangents to the circle x 2 4-y 2 = c 2 and the ellipse 
b 2 x 2 + a 2 y 2 - a 2 b 2 at a point of intersection. Find tan (f> when c is chosen so that (p is 
greatest. 

18. Write out the proof of Theorem III for the case of^a relative minimum. 

Show that, if / has a relative minimum at x 0 , and g(x) = -/(x), then g has a relative 
maximum at x 0 . Hence deduce the proof for the case of a minimum from the facts already 
established for a maximum. 


1.2 / THE LAW OF THE MEAN (THE 
MEAN-VALUE THEOREM FOR DERIVATIVES) 

The theorem which goes by the name of the law of the mean is one of the most 
important theoretical results in the subject of differential calculus. It is used as a 
tool in many places in the later developments of calculus, both differential and 
integral, particularly in connection with proofs. We wish to emphasize very 
strongly that the student of advanced calculus needs to gain an appreciation of 
the power of the law of the mean as an instrument of systematic reasoning. The 
first step should be to become thoroughly familiar with the content of the law 
itself. 

( The law of the mean.) Let f be a function which is continuous at 
each point of the closed interval a ^ x ^ b, and let it have a derivative at 
each point of the open interval a <x < b. Then there is a point x = X in the 
open interval (a < X < b) such that 

f(b)~f(a) = (b-a)f'(X). (1.2-1) 

The theorem has a geometrical interpretation. Represent the function 
graphically by the curve y = /(x), and let A, B be the points on the curve 
corresponding to x = a, x = b, respectively. The formula (1.2-1) states that there 
is some point on the curve, with abscissa x = X, at which the tangent is parallel 
to the line AB. There may be more than one suitable value of X ; the essential 
thing is'that there is always at least one (see Fig. 11). 

It is worth noting that (1.2-1) remains true if we 
exchange a and b, for both sides merely change sign 
when this is done. Thus, suppose Xj, x 2 are the end-points 
of an interval on which the conditions of the law of the 
mean are satisfied. Then we can write 

f(x 2 ) ~ fix 0 = (x 2 - xO/m (1-2-2) 

where x = £ is some point between x Y and x 2 . In Fig. 11. 
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writing this formula we do not need to know which of the numbers x u x 2 is the 
larger. 

The geometrical interpretation of the law of the mean makes its truth 
plausible. A proof must be based on analytical reasoning, however. We follow 
the usual procedure of basing the proof on an auxiliary theorem named after the 
seventeenth-century mathematician Michel Rolle. 

ItOLLE’S THEOREM Let g be a function which satisfies the conditions of 
Theorem IV, and suppose further that g(a ) = g(b) = 0. Then for some X such 
that a < X < b it is true that g'(X) = 0. 

distinguish two cases: (1) the case in which g(x) is zero on the 
whole interval, and (2) the case in which g(x) assumes values other than 0 at 
some points of the interval. In case (1) the derivative g'U) is identically zero and 
the existence of X is assured. In case (2) let M and m be the maximum and 
minimum values, respectively, of g(x) on the closed interval [a, b]. At least one 
of the values M, m must be different from 0, and must therefore occur at point x 
of the open interval (a, b). By conclude that g'(X) = 0. 


We have glossed over the main difficulty in this proof, namely the matter of 
the existence of the extreme values M and m. Here we appeal to the theorem 
which asserts that if a function is continuous on a finite closed interval, it 
actually attains its absolute extremal values at certain points of the interval. This 
theorem has been referred to before (see §1.12, following Example 3); we treat 
it formally asIThfpi^^HiTSI^^ 

The law of the mean is deduced from Rolle’s theorem by an artifice. The 
function / in Theorem IV need not vanish at x = a and x = b. Suppose, however, 
that y = F(x) is the equation of the straight line AB in Fig. 11, so that 
F(a) = /(a), F(b) = /(b). Let g(x) = F(x) ~f(x). Then g will be a function 
meeting the conditions of Rolle’s theorem. The equation of the line in question is 


v= f(b)-f(a) 

y b-a 


(x - a) + /(a). 


Hence we set 


g(x) = ^ (* - a) + f(a ) - /(x). 


Note that g(a) = g(b) = 0. The derivative is 


g w b-a 


fix). 


The conclusion g'(X) = 0 of Rolle’s theorem is now seen to be equivalent to the 
law of the mean in the form (1.2-1). 

We now give some simple examples illustrating the law of the mean in 
particular instances. 
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Example 1. Suppose /(x) = x 3 . Find a suitable value for X in (1.2-1) when 
a = — 1 and b - 2. 

Since /'(*) = 3x 2 , the law of the mean takes the form b 3 - a 3 = (b - a) 3X 2 in 
the present case. With a = ~ 1, b = 2 we have 8 - (-1) 3 = [2- (-1)]3X 2 , or 

X 2 = 1. Solving, we find X = ± 1. We want a value of X such that a < X < b, 

i.e., -1 < X < 2. Hence X = 1 is the suitable value in question. 

Example 2. If f(x ) = x 2 , show that the suitable value of X in the law of the 
mean is X = (a + b)/ 2. 

We have /'(x) = 2x. Hence the law of the mean becomes b 2 - a 2 = (b - a)2X, 
or X = (a + b)l 2. Where is this point located in relation to a and b? 

Example 3. If /(x) = sin x and Xi = 0, x 2 = 57 t/ 6, find £ such that Xj < £ < x 2 
and (1.2-2) holds. 

We have /'(x) = cos x and sin(57r/6) = i Hence (1.2-2) takes the form 

i * 5 77 - / \ -3 77 l. 

i - 0 = sin — - sin 0 = — cos £ 

o o 

or 

cos £ — = 0.19099. 

b 577 

Since 0< £ <5ttI6, we find ^ = cos" 19099) = 1.37863. This is somewhat less 
than tt/ 2, which is to be expected, as may be seen from a carefully drawn graph. 

In actual practice we are seldom interested in the exact value of the X 
occurring in (1.2-1); the important thing is that X lies between x = a and x - b. 
This enables us to obtain inequalities for estimating the value of f(x). 

Example 4. Show that | < log 1.5 < 2 . 

This may be done as follows: We take /(x) = log x, a = 1, b = 1.5; then 
f'(x) = 1/x, and by the law of the mean 

log 1.5 - log 1 = log 1.5 = (1.5 - 1) Y = X’ 


where 1 < X < 1.5. It follows that 


0.5 0.5 0.5 1 0.5 1 

1.5 X 1 ’ or 3 < X T 

This gives the required result. 

Example 5. Use the law of the mean to show that 

log X < log fl 1 
x x a 

if 0 < a < x. 

We use (1.2-1) with /(x) = log x and b replaced by x. Then 


(1.2-3) 


log x - log a = 


x — a 


a < X < x. 


X 
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Hence 


Now 


log x _ log a x - a 1 
* " x jc X 


£^£.±<±<1 

x X X a’ 


and so (1.2-3) is seen to be correct. 


A much used variant form of the law of the mean is obtained from (1.2-1) in 
the following way: Let h = b - a, so that b = a + h. Then X may be written 
X — a + Oh, where 0 < 0 < 1, because any number between a and a + h can be 
expressed in this latter form. Thus we have 

f(a + h) = f(a)+hf'(a + Oh ), 0 < 0 < 1. (1.2-4) 

Example 6 . Use (1.2-4) to show that 

(l + ft)“ > \ + ah (1.2-5) 

if h > 0 and a > 1 . 

We take f(x) = jc“, a = 1. Then by (1.2-4), 

(1 + h) a = 1 + ha(\ + 0h) a ~ l . 

If 0<h we have (l + 0h) a-1 >l, since 1 + Oh > 1 and a > 1. The inequality 
(1.2-5) follows at once. 

One very important consequence of the law of the mean will be formally 
stated here as a theorem. 


Let f be differentiable at each point of the open interval a < x < 
b, and suppose that f(x) = 0 at each such point . Then the value of the 
function is constant on the interval. 

* Prdcrf. It follows by that f is continuous at each point 

of the open interval (a, b). Consider any two distinct points of the interval, say 
Xu x 2 , where a <X\< x 2 <b. We may apply the law of the mean, obtaining 
formula (1.2-2). But /'(£) = 0, by hypothesis, and so f(xf) ~ f(x 2 ). We have now 
proved the theorem, for we have shown that / has the same value at any two 
points of the interval. 

%Th60f em V plays an essential role in the explanation of the relation between 
differentiation and integration, as we shall see when we come to the proof of 
ITheorefti Will, §1.53. ^hedre&W^is also useful in dealing with the concept of 
the “general solution” of a certain elementary type of differential equation, as 
we shall see in §1.4. 

The following example also affords an important application of the law of 
the mean. 
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Example 7. Suppose that f satisfies the conditions of the law of the mean on 
the interval a^x^b, and that /'(x)>0 when a<x<b. Show that f(x) in- 
creases as x increases. 

We are to show that Xi<x 2 implies /(x,)</(x 2 ) whenever a ^ x { <x 2 ^b. 
The law of the mean tells us that there is some £ such that x 1 <£<jc 2 and 
/(x 2 ) f (xj) — (x 2 x,)/^^). Since (x 2 -xj)>0 and /'(£)>0, we infer that /(x 2 )- 

/(*,) > 0, this is equivalent to _^(xi) ^/(x^. 

Example 8. Suppose that / is defined and differentiable on an open interval 
containing the point x 0 . Suppose that /'(x 0 ) = 0 and that for all x sufficiently near 
x 0 , f'(x)> 0 when x <x 0 and /'(x)c0 when x >x 0 . Show that these conditions 
are sufficient to guarantee that f has a relative maximum at x 0 . 

The argument is based on Example 7. As x increases, /(x) increases when 
f(x)> 0. By similar reasoning /(x) decreases when x increases if f{x) < 0. In the 
present case we see that the given conditions imply that for some small number 
h, /(x) is increasing as x goes from x 0 - h to x 0 , and decreasing as x goes from x 0 
to x 0 + h. Hence /(x) must attain a relative maximum at x 0 . 

From this argument it will be apparent to the student how one may 
formulate sufficient conditions for a relative minimum at x 0 . 

EXERCISES 

1. Use the law of the mean to show that 5 < V66 — 8 <i 

2. Prove that there is no value of m such that x 3 — 3x + m =0 has two distinct roots 

in the interval Use Rolle’s thoerem. 

3. If /(x) = x 3 - 3x 2 + 2x, a= 0, h=i, find a suitable value of 6 in the formula 
(1.2—4). 

4. For what values of C is Cx - sin x an increasing function of x (for all x)? 

5. Show that 2/77 < (sin d)/0 < 1 if O<0<7r/2. Hint: Examine the sign of the 
derivative of (sin 6)/ 6. 

6. Prove the following inequalities, using the law of the mean. 

(a) Vl+x<4 + (x-15)/8 if x > 15. 

(b) tan -1 x < (tt/4) + (x - l)/2 if 1 < x. 

( c ) tan x < ‘4 — 2 lf 0<x < L 

(d) hli 1 + h 2 ) < tan -1 h<h if 0 < h. 

7. Prove that the inequality (1.2-5) also holds if -1 < h <0. Explain the reasoning 
about inequalities with care, noting that if 0 < A < 1 and B < 0, then AB > B. 

8. Prove the following inequalities: 

(a) j * — < log( 1 + x) < x, if - 1 < x < 0 or 0 < x. 

(b) 1 H .* < V 1 + x < 1 + ix, if - l < x < 0 or 0 < x. 

2Vl+x 

(c) 1 - 1 < ^ j- J x y /2 < 1 - 2 (T + xp ’ if - 1 < x < 0 or 0 < x. 
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(d) p(x - 1) <x p - 1 < px p “'(x - 1) if 1 <x, 1 <p. 

(e) x m - 1 < m(x - 1) if 0 < m < 1, 1 < x. 

9. If a > b >0 and m and n are positive numbers such that m + n = 1, show that 
a m b n < ma + nb. Use Exercise 8(e). 

10. (a) Prove that x n + ax + b = 0 (a, b real) has at most two distinct real roots if n is 
even, and at most three if n is odd. 

(b) Prove that + ax 2 + b = 0 (a, b real) has at most three distinct real roots if n is odd, 
and at most four if n is even. 

11. Explain why a 0 x 4 + aa 3 + a 2 x 2 + a 3 x 4- a 4 = 0 must have a root between 0 and 1 if 
(ao/5) + (a i/4) + (a 2 /3) + ( a 3 l2 ) + a 4 = 0. 

12. Let F(x) = (f(x)-f(a))(g(b) - g(x)), where / and g are continuous when a ^x ^ 
b and differentiable when a <x < b. Suppose further that g'(x ) is never zero. Show that 
there is a £ between a and b such that 


s'(f) «(*)-*(£)■ 

13. Suppose f”(x) > 0 when a x ^ b. Explain by an analytical argument why there 
can be at most one point of the interval at which /'(*) = 0. What is the geometrical 
interpretation as regards the curve y =/(x)? 

14. Suppose / is continuous on the interval a <x <b, and that / is known to be 
differentiable on this interval except possibly at one point x 0 . Suppose further that 
lim^xo/X*) exists. Use the law of the mean to prove that / is differentiable at x 0 and that 
the derivative /' is continuous at that point. 

15. Generalize the result of Illustrative Example 7 as follows: Suppose / is con- 
tinuous on a^kx^b and differentiable on a < x < b. Suppose further that f(x) ^ 0 when 
a <x<b and that f(x) > 0 for at least one value of x. Prove that f(a ) < f(b). [It is easy 
to see that f(a) ^f(b); what requires more care is to show that /(a) ^ /(£>).] 

16. Suppose f(x) = x 2 sin(l/x) + (x/2) if x^O, and define /( 0) = 0. (a) Show that 

f(0) >0. (b) Show that, no matter how small the positive number h may be there are 

infinitely many points on both sides of x = 0 and within distance h of x = 0 at which 
fix) = \ and also infinitely many at which f\x) = — This shows that there is no interval 
containing x = 0 in which fix) is always increasing as x increases, in spite of the fact that 
the slope of the curve y = fix) is positive at x = 0. 

17. Suppose the following things: that f is continuous when 0^ x ^ a, differentiable 
when 0<x <a; that /(0) = 0; and that fix) increases as x increases. Show that fix)lx 
increases as x increases. (A suitable use of the law of the mean is indicated.) Examples 
are furnished by /(x) = e x — 1 and by fix) = tan x, 0 ^ x ^ tt}2. 

18. Let / be differentiable when a <x <b. Suppose xi and x 2 are distinct points of 
the interval, and let Pi and P 2 be the corresponding points of the curve. 

(a) Show that the condition for Pi to lie above the line tangent to the curve at P 2 is 

/(xi) ~/(x 2 ) - (xi - x 2 )/'(x 2 ) > 0. 

(b) If the condition in (a) holds for every pair of points on the curve, show that fix) 
increases as x increases. In fact, if Xi <x 2 , show that 


/'(*0< 


/(x 2 )~/(xi) 


<f'ix 2 ). 


x 2 — X 
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(c) Show, conversely, that if /'(x) increases as x increases, the condition in (a) is satisfied 
whenever Xi ¥■ x 2 . (Use the law of the mean.) 

(d) Under the conditions in (c) show that the curve y = /(x) between any two of its points 
lies entirely below the chord joining those points. Begin by showing that an analytic 
expression of this state of affairs is 

f(x)-f(x } ) < f(x 2 )~f(x ) 
x-xi x 2 -x 

whenever Xi < x < x 2 . 


1.3 / DIFFERENTIALS 

The notion of a differential is closely related to that of a derivative. For 
functions of a single independent variable this relationship is very close indeed, 
and very simple. For functions of several independent variables the relationship 
is less simple. At this point we are concerned only with functions of a single 
variable. We presume that the student is acquainted with differentials and their 
uses in the formal procedures of elementary calculus. Our purpose here is 
mainly to define differentials carefully and demonstrate the fundamental pro- 
perty upon which much of the usefulness of differentials depends. 

Suppose that / is a function of the independent variable x, and let us assume 
that / is differentiable for certain values of x (i.e., that f'(x) exists for these 
values). 

Definition. Let dx denote an independent variable which may take on any value 
whatsoever. Then the function of x and dx whose value is f'(x)dx is called the 
differential of f. Observe that the differential is a homogeneous linear function of 
dx; that is , for a fixed value of x, the differential has as its value a fixed multiple 
of dx. 

If we write y = /(x), and if / is differentiable for a particular value of x, it is 
customary to write 

dy = f(x) dx, 

so that dy is the value of the differential of / for 
assigned values of x and dx. 

If we regard x as fixed, dy is a dependent vari- 
able whose value depends on the independent 
variable dx. The variables dx and dy are often refer- 
red to as the differentials of x and y, respectively. 

The adjacent Fig. 12 illustrates geometrically 
the functional dependence of dy on dx, as well as 
the relation to the function / itself. The xy- 
co-ordinate axes and the graph of the function 
y = f(x) are shown in unbroken lines. A second 
co-ordinate system is showing with its origin at a 


h 3 -d 


dy 
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typical point (x, y) of the curve y = f(x). The axes in this system are scales for the 
measurement of the variables dx, dy. The equation (1.3-1) has as its graph a straight 
line of slope /'(*)• This line is, of course, the tangent to the curve y = f(x ) at the 
origin of the dx-dy co-ordinate system. 

From (1.3-1) we have the quotient relation 

^ =/'(*) 

whenever dx^ 0. The d -notations dx and dy go back to Leibniz’s work in the 
seventeenth century, but Leibniz did not define the derivative by the limit of a 
quotient as we did in (1.11-4). It is to be emphasized that there is no need for dx 
and dy to be small in (1.3-2). 

Example 1. If y =f(x)= sin x, calculate the value of dy for x = tt/3, dx = 
7t/6. Here /'(x ) = cos x, so dy = cos x dx. Evaluating, we obtain 

, / 7r\ 7 T T r 

dy = ( C °s T j - = — . 

Probably the most important feature of the formula (1.3-1) is that its truth is 
unaffected by the introduction of a new independent variable. 

Example 2 . Suppose y - x 2 and x = t 3 + t, so that y = (t 3 -I- t) 2 = t 6 + 2t 4 + t 2 . If 
we regard x as an independent variable, then dy = 2 xdx, by (1.3-1). Here dx is 
an independent variable. But if we regard t as an independent variable, then both 
x and y are dependent on t, and the notations dy, dx acquire new meanings: 

x = t 3 + f, dx = (3t 2 + 1) dt, 
y = t 6 + 2t 4 + t 2 , dy = (6 1 5 + 8f 3 + 2t) dt. 

But even with these new meanings, it is still true that dy = 2x dx. We verify this 
by writing 

2x dx = 2 (t 3 + t )(3t 2 + 1) dt = (6 1 5 + St 3 + 2t) dt = dy. 

What we have verified here in a particular case may be demonstrated in 
general by appealing to the rule for differentiating a composite function 
(Theorem II, §1.11). Suppose y=f(x) and x = g(f), so that y = F(t), where 
F(t) = /(g(0). Then, with t as independent variable, 

dy = F'(0 dt, dx = g'(0 dt. 

By Theorem II we have 

= §i»§ip 

Hence combining (1.3-3) and (1.3-4), 

dy = f f (x)g\t) dt = f'(x) dx, 

so that (1.3-1) holds, even though x and dx are no longer independent variables. 
The use of differentials is a great convenience in algebraic manipulations 
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which are incidental to much work in calculus. Using differentials rather than 
derivatives, one is often enabled to retain a desirable symmetry by not forcing a 
decision as to which variable is independent. The differential formula for arc 
length of a plane curve illustrates this point. The formula is 

ds 2 = dx 2 + dy 2 ; 

the co-ordinates (x, y) of a point on the curve are functions of some parameter, 
and the arc length s, measured from some chosen initial point on the curve, 
likewise depends on the parameter. But the formula (1.3-5) holds (granted 
suitable conditions on the curve) no matter what the parameter may be. 

The fact that f'(x) is the ratio of dy to dx no matter what variable is 
independent is of great usefulness when we wish to compute the slope of a curve 
defined parametrically. 


EXERCISES 

1. (a) If y = (1 - x 2 )/( 1 + x 2 ), compute dy when x = 1, dx = 2. 

(b) If x = tan(f/2), compute dx when t = ir/2, dt = 2. 

(c) If x in (a) is replaced by its value in terms of t from (b), show that, on simplification, 
y = cos t. From this formula compute dy when t = 7t/2, dt = 2. Compare the answers to 
(b) and (c) with your result in part (a). 

2. From x = r cos 6, y = r sin 0, and ds 2 =dx 2 +dy 2 derive the formula ds 2 = 
dr 2 + r 2 dd 2 . 

3. (a) If y = /(x), y' = /'(x), and so on, why is dy' = y" dx? 

(b) What are dy" and d(y') 2 when expressed with dx as a factor? 

(c) Suppose that x =/(£), y = g(t), and write x' = /'(£), y' = g'(t), and so on. Show that 


d /dy \ _ x'y"— y'x" 
dfldxj (x') 2 ‘ 


4. For a plane curve C, construct the tangent at a typical 
point P(x, y), and let angles <f > , ip, 0 be as indicated on Fig. 13, 
so that for the general case (p = 0 + ip + nnx where n is an in- 
teger (n = 0 in Fig. 13). From this equation and the relations 


show that 


4^ = tan <f>, — 
dx x 


tan 6, 


tan (p = 


x dy — y dx _ rdd 
x dx + y dy dr 


5. From ctn ip = (see Exercise 4) find dip in terms 
of r, r', r\ and d0, where r' = and r" = 



6. The curvature of a plane curve y = /(x) is defined as K = d</)/ds, where dy/dx = 
tan (p and ds 2 = dx 2 + dy 2 . Derive the formula 
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7. Using the definition of K in Exercise 6, and the relation between <£, if/, and 0 in 
Exercise 4, derive the formula 


r 2 + 2r' 2 - rr" 
K ~ ± (r 2 + r' 2 ) 3/2 * 


(Exercise 5 should be worked first.) 

8. Let y = /(x) be the equation of a plane curve C. The center of curvature of the 
curve corresponding to a particular point ( x , y) on C is a point (X, Y ) a distance R from 
( x , y) along the normal to C at (x, y), in the direction toward the concave side of the 
curve; here R is the radius of curvature. It may be shown that 


X = 
Y = 


x - 

y + 


y'(l + y' 2 ) 
y" 

1 + y' 2 

y" 


It is assumed, of course, that y" ^ 0. 

The locus of (X, Y) as (x, y) moves along C, is a curve called the evolute of C. We 
may regard the above equations as parametric equations of the evolute, with x as 
parameter. Show that dY/dX-—\ly'. This proves that the normal to C at (x, y) is 
tangent to the evolute at {X, Y). 


1.4 / THE INVERSE OF DIFFERENTIATION 

Many applications of calculus, some of them quite elementary, require the 
determination of a function from the knowledge of its first or second derivative, 
together with supplementary data about the function for particular values of the 
independent variable. The first and simplest general problem of this kind may be 
put as follows: 


Problem. Given a continuous function /(x), defined on a certain interval, find all 
functions defined on this interval and having f(x) as derivative. In symbols, find y 
as a function of x such that 

£-/<x). 0-4-,) 


Directly out of his or her experience with differentiation, the student is able 
to solve many problems of this type. The process is one of using standard 
differentiation formulas in reverse. Let us examine the reasoning carefully in a 
typical case. 

Example . Find y as a function of x such that y = 1 when x = 0 and 


dy __ x 
dx 1 + x 2 


(1.4-2) 


Here /(x) is defined and continuous for all values of x, so we want the solution 
to be defined for all values of x. 
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The normal procedure is to write 


j xdx 1 d(l + x 2 ) 

ay ~l+x z ~2 l+x 2 ’ 

0-4-3) 

y =ilog(l + x 2 )+ C; 

1 = 2 log( 1 + 0) + C, C = 1; 

(1.4-4) 

y = 1 log(l + x 2 ) + 1. 

(1.4-5) 


The supporting argument (usually not made explicit in practice) runs as follows: 
If there is a function y satisfying (1.4-2), then it also satisfies (1.4-3). In this 
latter form, as finally written, we recognize that a y satisfying (1.4-2) is 
furnished by (1.4—4), where C is an arbitrary constant. The proper determination 
of C is made by substituting the given matched values for x and y. Thus we 
obtain (1.4-5), which is actually a solution of the problem, as may be checked. 

One question remains: Is (1.4-5) the only solution to the problem? Once we 
obtain (1.4^4) it is clear that C is uniquely determined by the condition that y = 1 
when x = 0. The question is then: Does (1.4^4) give all the functions y satisfying 
(1.4-2)? The answer is affirmative, and is supplied by Theorem V, § 1.2. For let y 
be any funciion which is differentiable for all values of x and satisfies (1.4-2). 
Then the derivative of the function 

y — 2 log(i + x 2 ) 

is zero for all values of x; hence, by Theorem V, this function is constant, so 
that y is given by (1.4-4) for some value of C. 

The equation (1.4-1) is the very simplest type of first-order differential 
equation. The problem which we have posed in connection with this equation 
may be called the problem of finding the “general solution” of (1.4-1). The 
general solution is the family of all functions y = y(x) satisfying the equation. 
Any member of this family may be called “a particular solution.” By appeal to 
Theorem V, §1.2, we obtain the following conclusion: 

If yi(x) and y 2 (x) are any two particular solutions of (1.4-1), they differ by a 
constant. Thus, if yi(x) is any particular solution, the general solution is given by 

y = y t(x)+C, (1.4-6) 

where C is an arbitrary constant. In all this we assume, of course, that all the 
solutions considered are differentiable on the interval where /(x) is defined. 

The main problem is thus reduced to the finding of any one particular 
solution of (1.4-1). In many important simple problems such a particular solution 
may be found either by direct inspection or by various ingenious devices, all of 
which depend upon extensive familiarity with formulas of differentiation and 
manipulation of differentials. But it is not difficult to give examples in which no 
solution is forthcoming from the class of functions which a student meets and 
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learns to differentiate in elementary calculus. Thus, for example, the equations 


dy = 
dx 


! ,^=Vl + x 3 
dx 


have no solution within this class. 

It is well to pause at this point and reflect upon the meaning of the word 
“function.” In elementary differential calculus practically all our experience is 
with functions of a few basic types: algebraic, trigonometric and inverse 
trigonometric, exponential and logarithmic, and rather simple compounding of 
these types. It turns out that within this class of “elementary” functions, 
differentiation always leads to functions which are again in the class. Such is not 
the case with the inverse of differentiation, however; there are elementary 
fu nction s which are not derivatives of elementary functions, e.g., e~* 2 and 
Vl + x 3 . Now the general theorems of calculus deal with functions which are 
arbitrary except for requirements of differentiability or continuity, and which 
certainly need not be elementary in the sense of the first part of this paragraph. 
Once we have rejected the limitation of our considerations to “elementary” 
functions, we may well ask: What nonelementary functions do we know? If e~~ x2 
is not the derivative of any elementary function, how are we to find solutions of 
the equation dyldx = e~ x2 l Evidently it is necessary in some fashion to acquire a 
supply of nonelementary functions of which we know the derivatives. 

There are several very important methods for building such a supply. One 
method is that of integration of known functions. Starting with a given con- 
tinuous function /(x) defined on some interval, we form 

F(x) = [ X f(t)dt , (1.4-7) 

J a 

where a and x belong to the interval, and a is kept fixed. Another method is that 
of forming infinite series whose terms are given functions of x: 

F(x) = «j(x) + w 2 (x) + w 3 (x) + • * • . 

We shall later be able to show that 

I** Vl + t 3 dt 

Jo 

is a function F(x) such that F'(x) = Vl + x 3 , and that 

x 3 x 5 x 7 , 

X 3-1! 5-2! 7-3! 

is a function F(x) such that F'(x) = e~* 2 . 

The study of functions defined by infinite series will concern us in a later 
chapter of this book. Our immediate interest will be confined to functions of the 
type (1.4-7) defined by integration. We shall presently learn (see the end of 
§1.52) that if /(x) is continuous on a given interval the general 

solution of the equation dyldx = /(x) on that interval is y = F(x) + C, where C is 
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an arbitrary constant and F(x) is defined by (1.4-7). The integral is defined by a 
limiting process involving sums. 

Every student who has come this far in his study of calculus knows that 
there is a very close connection between differentiation and integration. But 
elementary calculus books vary widely in their discussions of integration and in 
their treatment of the link between differentiation and integration. Thus we shall 
not assume any uniformity of knowledge on this subject by the readers of this 
book. The next sections will be devoted to the subject of integration and its 
relation to differentiation. Our aim will be to lay out a precise logical pattern of 
definitions and theorems which will provide a common understanding of the 
integration concept and of the connection between differentiation and in- 
tegration. 

EXERCISES 

1. What is the logical objection to the following procedure? Let the symbol Jo e~ sl ds 

be defined as that function F(t ) such that F'(0 = e~‘ 2 and F(0) = 0. Then — (F(t) + C) = 

F'(t) = e _f2 , and therefore v = fd e~ s2 ds + C is the general solution of the equation 
dvfdt = e ~ ‘ 2 . 

2. What is the logical objection to the following procedure? Let /(x) be a given 
function defined when a^x^b. Let F(x ) be any differentiable function such that 
F'(x) = /(x) for each jc of the given interval. Then define J! fW dx = F(b) - F(a). 

3. Try to find a continuous F(x) such that F(0) = 0 and F'(x) = [x] when 0^ x ^ 3 
(where [x] denotes the greatest integer ^x). Do you succeed fully? Where is the 
difficulty? 


1.5 / DEFINITE INTEGRALS 

We are going to define what we mean by the definite integral of a function. We 
start with a function /, which we suppose to be defined and continuous on a 
closed interval a ^ x ^ b. These things being given, the definite integral of / over 
the interval is a certain number, which we denote by 

f f(x) dx, (1.5—1) 

J a 

and which we arrive at by a defining process which we shall outline in four steps, 
as follows: 

Step 1. Choose an integer n ^ 1 and subdivide the interval [a, b] into n 
subintervals by choosing points x 0 , x u . . . , x n such that a = x 0 < x { < • • < 
x n = b. We adopt the notation 

Axi = Xi i = 1, . . . , n. 

Step 2. In each subinterval choose an arbitrary point, denoting the point in 
the ith subinterval by x\, so that 

Xj_! = X i = Xj. 
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Step 3. Find the value of the function at each of the points chosen in Step 2, 
and form the sum 

/W) Axj + /(x5) A* 2 + * ' * + /(*;,) Ax„. (1.5-2) 

This is called an approximating sum. 

Step 4. Find the limit of the sums (1.5-2) as n is increased and the maximum 
of the numbers Ax ]s . . . , Ax n is made to approach zero. This limit is, by 
definition, the definite integral (1.5-1), so that 

[ /(x) dx = lim 2 fix'd Ax/. (1.5-3) 

Ja i=l 

In Chapter 18, we shall study the theory of integration systematically. At 
that time we shall prove that the sums (1.5-2) do actually approach a limit in the 
case of any continuous function. For the present we take for granted the 
existence of this limit, and its uniqueness. A further discussion of the limit 
concept associated with the integral will be found in §1.63. 

A definite integral is thus defined as the limit of a certain kind of sum 
associated with the function. A geometrical interpretation of the integral can be 
made in terms of the area under the curve y = f(x ) from x = a to x-b. Each 
term in the approximating sum (1.5-2) is the area of one of the shaded rectangles 
in Fig. 14. The area under the curve is the limit of the sum of the areas of these 



rectangles. We assume that the student is already familiar with this geometrical 
interpretation of the integral, and with the extension of the interpretation (by the 
concept of negative area) to those situations where the curve y = f(x) goes 
below the x-axis. We emphasize, however, that we do not define the definite 
integral as the area under the curve. The area interpretation is merely a 
convenient method of bringing our intuition into play to aid us in grasping the 
nature of the definition (1.5-3). We “feel” that the area exists, and that a good 
approximation to it can be obtained by the sums (1.5-2), provided we take all the 
subintervals short enough. Actually, the area is defined as being equal to the 
limit in (1.5-3). 
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Area is only one of many geometrical and physical concepts whose exact 
quantitative measurement is furnished by a definite integral. The student is no 
doubt already familiar with such applications of integration as the finding of 
volumes (including volumes of solids of revolution) by slicing them into thin 
slabs or thin cylindrical shells, the calculation of arc-lengths of curves, the 
location of centroids and centers of gravity, and the reckoning of force due to 
water pressure on submerged plane surfaces. In every one of these applications 
the integral as the limit of a sum is the fundamental concept. Now we wish to 
emphasize that the concept is applicable to any continuous function, and that 
the concept itself does not depend on any geometrical or physical interpretation. 

Suppose that m; is the smallest value of f(x) in the subinterval jc ; _i ^ x ^ x*. 
If we choose x\ as a point at which /(x) takes on the value m h the sum (1.5-2) 
becomes 


m i Ax i + m 2 Ax 2 + • • • + m n Ax„. ( 1 .5—4) 

This particular approximating sum is called a lower sum. Likewise we define an 
upper sum by choosing x\ to be a point of the subinterval x,_i ^ x ^ x, at which 
/(x) takes on its greatest value M, for that subinterval. The upper sum is 

M, Ax, + M 2 Ax 2 + • • - + M„ Ax n . (1.5-5) 

The upper and lower sums have the property that 

5 = P f(x) dx ^ S, (1.5-6) 

J a 

where s represents the lower sum and S represents the upper sum. This system 
of inequalities makes it possible to obtain numerical estimates of the value of an 
integral. Upper and lower sums are discussed at greater length in §18.1. 

Example 1. Estimate the value of / 2 (75x - x 3 ) dx by lower and upper sums, 
using four equal subintervals. 

We take f(x) = 75x - x 3 , x 0 = 2, Xj = 3, x 2 = 4, x 3 = 5, x 4 = 6. See Fig. 15. Each 
Ax f = 1. Since f'(x) = 15- 3x 2 , it appears that /(x) increases from x = 2 to x = 5, 
and decreases from x = 5 to x = 6. From the accompanying table of values we 


y 



Fig. 15. 
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can then read off the values of m* and Af,: 


X 

/« 

2 

142 

3 

198 

4 

236 

5 

250 

6 

234 


i 

m, 

Mi 

1 

142 

198 

2 

198 

236 

3 

236 

250 

4 

234 

250 


Thus the lower and upper sums are 

5 = 142+ 198 + 236+234 = 810, 
S = 198 + 236 + 250 + 250 = 934. 


Therefore 


810 < J 6 (75x - x 3 ) dx <934. 

A better estimate could be obtained by using more and smaller subintervals. 

One of the simple but very important facts about integrals is expressed by the 
formula 

( C f(x)dx= V f(x)dx+ [ C f(x)dx, (1.5-7) 

Ja Ja Jb 

where a <b < c, and / is continuous on the interval a ^x ^ c. The analytical 
proof of this formula runs as follows: Divide the whole interval [a, c] into parts 
in such a way that x — b is always one of the points of subdivision. Suppose that 
[a, b] is divided into m parts and [ b , c] into n parts: 

a = x 0 < *i < ' • * < = b = £ 0 < f i < * * • < & = c. 

Choose points x\ and such that x,-, ^x-^x*, i = 1, . . . , m, and §_i ^ §, 

j = 1, . . . , n. Then, as m and n -»o°, and as the greatest of the differences Ax„ 
Af,- approaches zero, we have 

fb m 

f(x) dx = lim 2 f(x'i ) Ax„ 

Ja i = l 

f f(x) dx = lim 2 /(£S) A§. 

Jb /=! 


J f(x ) dx = /(*!) + 2 /(£i) A& J. 


But also, 
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Formula (1.5-7) then follows from the principle that the limit of a sum is equal to 
the sum of the limits. We apply the principle to the sum of the two expressions 

m n 

2 f<x',) Ax, and £/(*}) Afi. 

i= I J = 1 

There are various limiting processes used in calculus: limits of functions, 
limits of sequences, and the type of limit (of approximating sums) used in 
defining an integral. For each of these limiting processes there is the principle 
that the limit of a sum is equal to the sum of the limits. For limits of functions 
this principle was formulated in the rule (1.1-4). A proof of the validity of the 
rule (1.1-4) is given in §1.64. With slight formal modifications, the idea of this 
proof applies equally well to sequences and to limits of approximating sums. 
Basically, all these various forms of the principle are covered by the theorem 
that the process of addition is a continuous function of the things which are 
added (see Theorem VI, §17.5). 

Of course, the geometric interpretation of (1.5-7) is 
very simple and obvious: The area between the curve 
y — /(*) and the x-axis from a toe is the algebraic sum of the 
partial areas from a to be and from b to c (see Fig. 16). 

We do not in practice compute the precise values 
of definite integrals by direct application of the definition 
(1.5-3). In some simple cases, however, we may be able Fig. 16. 
to find the precise value of the limit of the approximating 

sums by a direct examination of the sums. Usually these direct procedures involve 
subdivisions of the interval into equal parts. 

Example 2 . Find the value of J fl b x 2 dx, assuming 0 < a < b. 

For convenience we write (by (1.5-7)) 

j x 1 dx = J x 2 dx- J x 2 dx. 

We concentrate on finding the value of Jo x 2 dx. 

Dividing [0, b ] into n equal parts, we write 

A b 2b nb , 

x 0 = 0, xi = -> x 2 = — > . . . , x„ = — = b. 



Here each Ax, = — . We choose x\ = x f . Then 


S /(*!) = 2 X i AXj 


= ^(1 2 + 2 2 + • • • + re 2 ). 
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There is a convenient formula for the sum of the squares of the integers from 1 to n 
(see Exercise 6): 


l2 + 2 2 + . . , n 2 = n(n+ l)(2w + 1) 


Combining the foregoing observations, we see that 


/v 


dx = lim 


n(n + l)(2n + 1 )b 
6n 3 4 


Now 


n(n + l)(2n + 1) _ ^ , 3 , 1 


(1.5-8) 


(1.5-9) 


(1.5-10) 


so that, as n -+<*>, the expression on the right in (1.5-10) approaches 2 as a limit. 
Hence, from (1.5-9), 

b OJ,3 jy 3 

T’ 


I X'dx -If- 
Jo o 

Since b was arbitrary, it follows that 


/“ 


x 3 dx = — 


Therefore 


i: 


b 2 j b 3 -a 3 
x dx = — - — 


EXERCISES 

1. (a) Using four equal subintervals, calculate upper and lower sums for the integral 
(x 5 6 -3x 2 + 3 )dx. 

(b) Repeat (a), using eight equal subintervals. 

(c) Calculate the value of the approximating sum (1.5-2), using four equal subintervals, 
and taking x' k to be the midpoint of the fcth subinterval. 

2. (a) Calculate the value of the approximating sum (1.5-2) for the integral | — , 

J > x 

using six equal subintervals and taking to be the midpoint of the kth subinterval. 

(b) Calculate upper and lower sums for the integral in (a), using six equal subintervals. A 
table of reciprocals will be found convenient for this exercise. 

3. Follow the instructions of Exercise 1 as applied to the integral Jo(4x 2 -12x + 
10) dx. 

4 . Apply the definition (1.5-3) to find the value of the integral fZ f(x) dx if /(x) = c, 
where c is a constant. 

5. (a) Let Ai(n) = 1 + 2+ • • * + n. Noting that Ai(n) = n + (n - 1) + • ■ - + 1, show 
that 2A](n) = n(n + 1). 

(b) Using the formula for Ai(n) found in (a), calculate fa x dx by a method like that used 
in Example 2. 

6. LetA 2 (n) = l 2 + 2 2 + • • • + n 2 . Obtain a formula for A 2 (n ) as follows: Start with 

(p + l) 3 ~ P 3 = 3p 2 + 3p + 1. 
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Write this out with p = 0, 1, . . . , n, putting the results in order as shown. 

l 3 -0 3 = 3 • 0 2 + 3 • 0 + 1 
2 3 -l 3 = 3*l 2 + 3-l + l 


(n + l) 3 - n 3 = 3 ■ n 2 + 3 • n + 1. 

Now add, noting the cancellations on the left, to obtain 

(n + l) 3 = 3A 2 (n) + 3A,(n) + (n + 1). 

Use the formula for Ai(n) from Exercise 5(a) to solve for A 2 (n), obtaining A 2 (n) = 
n(n + l)(2n + 1) 

6 

Imitate the foregoing procedure (a) to find the formula for Ai(n) in this new 
way; (b) to obtain a formula for A 3 (m) = 1 3 + 2 3 + • ♦ • + n 3 . Then use your result to 
calculate f„ x 3 dx by the method of Example 2. 

7. Let f(x) = e x and assume b>0. Using n equal subintervals and a method 
somewhat like that of Example 2, show that 

where h = bln. Use the definition of /'(0) to calculate the value of the limit, and so find the 
value of the integral. 

8. Taking f(x) = x, and choosing x\ = (x,-i + jc»)/ 2, show that the approximating sum 
(1.5-2) has a value which is independent of n and the choice of the points Xi, x 2 , • • • > *n- i. 
From this result calculate /<? xdx, using the definition (1.5-3). 

9. The function /(x) defined as (sin x)lx when x^ 0 and /( 0) = 1 is continuous for all 
values of x. Why? Show that the value of /(x) decreases steadily as x increases from 0 to 
7 t/ 2. Below is given a table of values of sin x. Use it to obtain high and low estimates of 
the value of Jo 12 f(x) dx. 


n 1 2 

3 

4 

5 

6 

7 

sin ~ 0.1951 0.3827 

lo 

0.5556 

0.7071 

0.8315 

0.9239 

0.9808 


10. Calculate J? x p dx, where p is a positive integer, by the following procedure. Divide 
the interval [1,2] into n parts by the points x 0 = 1, Xi = h, x 2 = h 2 , . . . ,x„ = h", where 
h = 2 1/n , so that h n = 2. This does not give subintervals of equal lengths, but of lengths in 
geometric progression. Choose x' k = x k -i, and show that (using (1.5-3)) 

f 2 1 — 2 ,/n 

x p dx = (2 P+1 — 1) lim 

J l n-><» 1 Z 

Use the formula for the sum of a geometric progression on (1 - h)/(l — h p+1 ), and thus 
show that the limit in the above formula has the value l/(p + 1). Make use of the fact that 
lim = 1 if C >0. This is proved in §1.62, Exercise 16. 

11. After careful study of Exercise 10, adapt the method to find the value of / fl b x p dx 
where 0 < a < b and p is a positive integer. Start with x 0 = fl, Xi = ah, x 2 = ah 2 , etc., where 
h = (b/a) l,n . 
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1.51 / THE MEAN VALUE THEOREM FOR INTEGRALS 

The theorem which we prove in this section has an importance much like that of 
the law of the mean (Theorem IV, §1.2), in that it is valuable as a tool in 
systematic reasoning in calculus. We are introducing it here because of the use 
we shall have for it in our program of outlining the relation between differen- 
tiation and integration. 

THEOREM VI. ( The mean value theorem for integrals.) Let f be continuous on 
the closed interval [u, b]. Then there is some number X such that a ^ X ^ b 
and 

f f(x) dx = (b - a)f(X). (1.51-1) 

J a 

Proof. Let m and M denote the minimum and maximum values of / on the 
interval. Consider the approximating sums (1.5-2) of Step 3 in the definition of the 
integral (§1.5). 

We have 

m ^ f(x'i) = M. 

Therefore 

m £ Ax, S £ f(x i) Ax,- S M £ Ax, 

i= I i = 1 i=l 


But evidently 

2 Ax,- = b - a. 

i = \ 

Hence the approximating sum (1.5-2) lies between m(b — a) and M(b — a). 
Consequently the definite integral, being the limit of the approximating sums, 
must also lie between these two numbers; that is, 

m(b - a) ^ f f(x) dx ^ M(b - a), 

Ja 

or 

— f f(x)dx^M. (1.51-2) 

b - a Ja 


Let us set 

-^j>x. (L51 - 3) 

We now reason that f(x) must take on the value p, at some point x = X, 
a^X^b, since by (1.51-2) p, lies between the smallest and largest values of 
f(x) on the interval. Once this argument is accepted, the proof is complete, for 
the equation /(X) = p, is equivalent to (1.51-1), in view of the definition of p,. 

The existence of X such that /(X) = p, depends upon the hypothesis that / is 
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continuous. Such existence is made plausible by intuitive consideration of the 
variation in value of a continuous function. An indubitable proof of the exis- 
tence of X must await our systematic consideration of the properties of 
continuous functions (in Chapter 3). The remarks which we made about the 
existence of m and M in connection with the proof of Rolle’s theorem (§1.2) 
apply equally here to the existence of X. 

The number n defined by (1.51-3) is called the average value of the function 
/(x) on the interval [a, b]. The sense in which this concept of average value is an 
extension of the simple notion of the arithmetic mean as an average value is 
indicated in Exercise 1. 

EXERCISES 

1. Let [a, b] be divided into n equal parts, and let y* be the value of /(x ) at the 
midpoint of the ith subinterval. The arithmetic mean of yi, . . . , y„ is 

A. = y ‘ . + - 

n 

Show that jut = A n . 

2. B y interpreting the integral as an area, calculate the average value of /(x) = 
Vfl 2 -x 2 on the interval -a ^ x ^ a. 

3. A right circular cone of altitude H and radius of base R has its axis along the 
x-axis. For a given value of x let A(x) denote the area of cross section of the cone by a 
plane perpendicular to the x-axis at that point. What is the average value of A(x), x 
ranging over all values for which the plane cuts the cone? 

4. In Theorem VI it was asserted that an X can be found on the closed interval [a, b ] 
such that (1.51-1) holds. If /(x) is constant on [a, b], say /(x) = C, then X may be taken 
as any point of the closed interval, for in that case fa /(x) dx = C(b - a), and C = /(X), 
no matter how we choose X. Hence certainly we can choose X so that a <X <b. Show 
that this can also be done if /(x) is not constant on [a, b]. State precisely what you are 
taking for granted about continuous functions. 


1.52 / VARIABLE LIMITS OF INTEGRATION 

Before coming to the main subject of this section it will be well to consider a 
matter of notation. In the symbolic expression 

f f(x)dx 

J a 

we refer to x as the variable of integration . The value of the integral does not 
depend upon the letter which is used for the variable of integration. For 
example, 

f x 3 dx = f t 3 dt = f u 3 du. 

Jo Jo Jo 

In cases where the limits of integration are literal symbols it is important to 
avoid using the same letter for a limit of integration and also for the variable of 
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integration. The danger of confusion is made apparent by asking: What is the 
value of the expression 


i: 


x 3 dx 


when jc = 2? 

Let us now suppose that we have a function fit ) which is continuous on the 
interval a ^ t ^ b. Regarding x as a variable, consider the function F(x) defined 
by the integral 

F(x) = r f(t) dt. (1.52-1) 

Ja 


y=f(t) 


It is clear that the integral does actually define a function of the upper limit x 
provided a ^ x ^ b, for once the function fit) and the lower limit t = a are 
chosen and fixed, the integral has a definite numerical 
value which depends only on x. If we resort to the 
interpretation of the integral as an area, we may obtain 
a geometrical representation of the function F(x) as 
the area under the curve y = fit) from t = a to t = x 
(see Fig. 17). It is natural to complete the definition 
(1.52-1) by setting F(a) = 0. As a matter of fact, it is 
usual to define 


i. 


fit) dt = 0 


(1.52-2) 



and 


J" m dt = - f h m dt, a <b. 


(1.52-3) 


These formalities are convenient when dealing with 
integrals as functions of the limits of integration. With 
these two formulas available it is not difficult to see 
that formula (1.5-7) is valid for any positions of a, b, c 
on an interval where / is continuous. 

A graph of y = F(x), corresponding to a graph of 
y = f(t) as shown in Fig. 17, would appear somewhat 
as in Fig. 18. 

We are going to be primarily interested in the 
derivative of the function F(x) defined by the integral 
(1.52-1). 



THEOREM VII. Let fit) be continuous , a ^ t ^ b, and define 
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Then F(x ) is differentiable , with derivative 

F'{x) = f{x). (1.52-4) 

The formula (1.52-4) may be put verbally in the form : the derivative of the 
definite integral of a continuous function with respect to the upper limit of 
integration is equal to the value of the integrand function at this upper limit. 


Proof. We consider two points x, x + Ax of the interval [a, b]. Then 


/•x+Ax rx 

Fix + Ax) - Fix) = f(t)dt-\ fit) 

J a Ja 


dt 


rx+Ax 

= J f(t) dt. 


By the mean value theorem for integrals (Theorem VI, §1.51) we have 

“x+Ax 

fit) dt — Ax ■ /(X), 


r 


where X is some number between x and x + Ax. Thus we see that 


F(x + Ax)-F(x)_ f y 
Ax “ /(XX 


In this equation we now hold x fixed and make Ax approach zero. Then X 
approaches x, and /(X) approaches /(x). Therefore 


|im F(x + A x L -F(x ) 

Ax->0 Ax 


= /(*). 


The limit on the left is F'(*), by definition. Thus (1.52-4) is established. It is clear 
from the proof that if x is at one end of the interval [a, b]. Ax must be restricted 
to have but one sign (e.g.. Ax >0 if x = a). In this case F'(x) is a one-sided 
derivative. 


Theorem VII furnished us with a complete solution to the problem raised in 
§1.4 in connection with (1.4-1). If /(x) is a given function, continuous on the 
closed interval [a, b], then the general solution of the equation 



on this interval is 

y = { X fit) dt + C, 

J <1 

where C is an arbitrary constant. This follows at once from Theorem VII and 
the italicized statement accompanying (1.4-6). 
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EXERCISES 

1. The functions /, g, h are assumed to be continuous for all values of their 
independent variables. Complete each of the following equations: 

(®) lu l dx= (b) ^ l g(s) ds = (c) G( y ) = / hG ~> dt ’ G '(y) = 

2. If <t>(t) = /„'(3 + X 3 )-' 1,2 dx, find (a) <t>'(0), (b) 4>'(1), (c) 

3. If F(x) = f^dy. find (a) (b) (c) FU). 

4. If G(x) = /- 2 |s| ds, find (a) G'(-l), (b) G'(0), (c) G'(2), (d ) G'(a). 

5. (a) If F(x) = Jo t(t - 1 )e~* 2 dt, find the points of relative maxima and minima of 

F(x). (b) What is the value of F(0)? (c) For what values of jc is F’(x)> 0, and for 

what values of x is F'(x ) < 0? 

6. If F(x ) = Jo Fe ^ dt, find the absolute minimum value of F(jc). 


1.53 / THE INTEGRAL OF A DERIVATIVE 

The theorem which we shall prove in this section is fundamental, for it 
establishes the standard technique whereby definite integrals are calculated in 
practice. The four-step defining process of arriving at a definite integral, as set 
forth in §1.5, is difficult to apply. For a large and important class of integrands 
the following theorem provides a convenient method of finding the value of the 
integral. 

THEOREM VIII. Let f be a given function continuous on the closed interval 
[a, b]. Suppose that F is any differentiable function such that F'(x) = /(x) 
when a ^ x ^ b. Then 

f fix) dx = F(b)-F(a). (1.53-1) 

J a 

Proof. We are by hypothesis given a function F(x) whose derivative is f(x). 
By Theorem VII (§1.52) we know another function with this same derivative, 
namely 

f X f(t) dt. 

J a 

Thus the function 


G(x) = F(x)- [ X f(t)dt 

J a 


is constant, by Theorem V (§ 1.2), since its derivative is zero. 
Now G(a) = F(a)-0, 

by (1.52-2). Also, 


G(b)=F(b) 


-f f(t)dt. 

J a 
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Since G(x) is constant we have G(b ) = G(a), 

° r F(b)~ [ b f(t)dt = F(a). 

J a 

This result is equivalent to (1.53-1), so the proof is complete. 


Example 1 . Find the value of the integral jo 12 sin x dx. 

Applying Theorem VIII, we seek a function of x whose derivative is sin x. Such 
a function is -cos x. Therefore 

r W2 ^ 

sin jc dx = — cos -r- + cos 0 - 1 . 

Jo l 


We are now in a position to see clearly the connection between differentiation 
and integration. As concepts, by their definition, these processes are quite 
independent of each other. It turns out, however, that each process is in a 
certain sense inverse to the other. The two aspects of this mutual inverseness 
are displayed by Theorems VII and VIII. If we want a function defined when 
a^kxtkb and having as its derivative a certain given continuous function /(x), 
the class of all functions satisfying our want is the family /* f(t) dt + C. If, on the 
other hand, we wish to integrate a given continuous function /(x), we can do so 
by the formula 

f b f(x)dx = F(b)-F(a) 


provided we can find a function F(x) having /(x) as its derivative at all points of 
the interval [a, b]. 


Example 2. Evaluate the integral j 


dx 


We seek a function whose derivative is 1/x when — 10 ^ x ^ 
formula 

d . 1 

~T log x = - 
dx x 


- 2. The familiar 
(1.53-2) 


will not quite do, for log x is not defined if x<0. But, if x<0, log(-x) is defined, 
and 


^-log(-x) = — (-1) = -. 
dx —xx 

Hence, by Theorem VIII, with /(x) = 1/x, F(x) = log(-x), we have 

-2 


(1.53-3) 


i 


^ = log(-x) 

10 X 


10 


= log 2 - log 10 = log 5 - 


The formulas (1.53-2) and (1.53-3) can be combined in the single formula 

log Ixl = — if XT* 0. 
dx 1 1 x 


(1.53-4) 
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We now consider a theorem which is much used in transforming definite 
integrals by substitution. 


THEOREM IX. Suppose the following conditions are fulfilled: 

(1) f(x) is continuous , a^x^b; 

(2) <f)(t ) and (f)'(t) are continuous , and a ^ 0(0= b when 

(3) 0(a) = a, 0(/3)=b. 

Letg(t) = f(4>{t)W(t)- Then 

f f(x) dx = g(t) dt. (1.53-5) 


We note that the formula (1.53-5) is easily remembered in the following way: 
when x = <f)(t ), dx = 0'(O dt, so that /(x) dx = g(f) dt. The limits x = a, x-b 
correspond to the limits t = a, t = /3 by (3). The proof of the theorem is left to 
the student (Exercise 10). 


EXERCISES 

1. What is wrong with the following equations? 


(a) f£- i 

2 if 77 

= — i — 1 = — i. (b) sec 2 xdx = tan x 

J- 1 X X 

o 

—> 

T 


= 0 . 


2. Show that log \x — cl = — - — . Under what restrictions on a, b, c is the formula 
dx 1 1 x- c 


f dx 

b — c 

= log 

Ja X-C 

a - c 


correct? 


3. Find the values of the following integrals: 




dx 


4 x — 2’ 

-3 


( 10 dx . f 4 xdx 

(b) L (c) i F^25 ; 


« £ ^ (e) j-i 9-~P’ (t) l 


10 


dx 


4. Show that -r~ log 
dx 


a + x 
a - x 


2 a 


9- x 2 ‘ 

;3 if x 2 a 2 , 


and hence that 





a + x 2 
a — x 2 


a- x, 

a + Xy 


} 


provided that Xi and x 2 are not separated by either of the points x = a, x - - a. 

5. The student will need to recall that, by the standard conventions about principal 
values of the inverse sine and inverse tangent, y = sin -1 x is the unique y such that 
x = sin y and - tt/2 ^ y g 7 t/2, while y = tan -1 x is the unique y such that x = tan y and 
— tt/ 2< y < vl 2. Find the values of the following integrals: 



dx 




1 dx 
-vj 3 + x 2 



dx 


Vl6-x r 
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6. Show that 


tan x 


- r dt • 

Jo l + t 2 ’ 
in l x=[ AA- — 1 < x < 1. 

y o vi-t 2 


7. Show that 


J **2 

tan xdx = log 

XI 


COS X\ 


cos x 2 


TT ^ ^ TT 

’ ~"2 <X, = X2< T 


8. Standard tables of integrals list the formula 
dx 2 


h 


(a) Let 


+ b cos x \Ja 2 ~b‘ 


F(x) = 


Va 2 — b 2 


tan 


tan' 



a 2 > b : 


and verify that 


F'(x) = 


a + b cos x 


whenever x is not an odd multiple of tt. What can be said about F(x) for these 
exceptional values of x? 

(b) Assuming 0 <b <a, find the limit of F(x) as x -> 7r from the left; as x -> tt from the 
right. 

(c) Considering F(x) as defined at tt by its limit as x -> tt from the left, use Theorem VIII to 
show that 


i 


dx 


(d) Find the value of 

Jo 


o a + b cos x 
317/2 dx 


Va 


if a > b > 0. 


5 — 3 cos x‘ 

9. (a) Discuss critically the integration formula 

dx 


h 


cos 2 x + Jb 2 sin 2 x 


= -7- tan tanxV 
ab \a J 


assuming that a and b are positive. Compare with Exercise 8 (a), (b). 

(b) Explain the reason for the apparent failure of Theorem VIII in the obviously false 
result 


dx 


cos x + 4 sin x 


= 5 tan ’(2tanx) 


k tan J (0) - 5 tan ’(0) = 0. 


(c) Show that 



dx 

a 2 cos 2 x + b 2 sin 2 x 


TT 

lab * 


10. Prove Theorem IX with the aid of the following suggestions: Define F(x) = 
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5af(s)ds , H(t) - F((j>(t)). Show that H'(t) = g(t) (by what theorem?). Then H(j 3)~ 
H(<x) - /<? g(t ) dt (by what theorem?). Now complete the proof of Theorem IX. 

11. Show that /o' 2 sin” x dx = Jo 12 cos" y dy by an appropriate substitution. 

12. Show that fj x m (l - x)" dx = fd x"(l - x) m dx by an appropriate substitution. 

13. If / is continuous on [a, b], show that J b f(x) dx = J b a f(a + b - x) dx. 

14. If (j> is a continuous function on [0,1], show that Jo 12 </>(sin x) dx = 
Sni 2 <j >( sin x) dx, and hence that Jo (p(sin x) dx = 2J 0 n/2 cf >( sin x) dx. 

15. Show that the s ubstitution x = a cos 2 1 + b sin 2 1 fulfills condition (2) of Theorem 
IX for the integral J b V(x - a)(b - x) dx. Use the substitution to find the value of the 
integral. 

16. Show that J- a f(x ) dx = f 0 a [f(x) + /(-x)] dx if / is continuous on [-a, a]. What do 
you conclude about the value of the integral if / is an odd function? an even function? (We 
call / even if /(— x) = /(x) for all x, and odd if f(—x) = ~f(x).) 

17. Show that J a b xf"(x) dx = bf'(b ) - f(b) + /(a) - af'(a). 

18. Show that, if / is continuous on [0, 1], Jo x/( sin x) dx = (irf 2) Jo /(sin x) dx. Use 

this result to find the value of [ - dx. 

Jo 1 T cos 2 x 


1.6 / LIMITS 

Most students of calculus get their first extensive experience with the limit 
concept in the course of learning about differentiation. In the process of working 
out formulas for the derivatives of x", sinx, and other elementary functions, as 
well as in the establishment of the rules for differentiating sums, products, and 
quotients, students are taught to use some of the fundamental theorems about 
limits, such as: 

The limit of a sum is equal to the sum of the limits. 

The limit of a product is equal to the product of the limits. 

The limit of a quotient is equal to the quotient of the limits, provided the limit of the 
denominator is not zero. 

These are not fully and precisely formulated theorems in the form here 
given; nevertheless, each statement conveys the central idea of an important 
theorem about limits. No doubt most students accept the truth of these three 
propositions as being intuitively evident. Probably the most that can be expec- 
ted, perhaps all that is desirable, in an elementary course in calculus, is the 
cultivation in the student of an awareness that these propositions exist and that 
it is necessary to appeal to them in building up the structure of calculus. As we 
proceed to a more advanced level, however, it becomes more important for us to 
analyze the limit concept carefully, and to see how the whole theory of limiting 
processes is developed. As with any part of mathematics, we cannot build a 
clear and precise theory of limits unless we formulate our basic definitions in 
terms sharp enough for use in the giving of clean-cut proofs. 

There are at least three recognizably distinct limit concepts in elementary 
calculus. We describe them briefly in turn. 
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(1) The limit of a function of a continuous variable: lim^^/fa). As exam- 
ples of this kind of limit we cite 

(a) lim-^ = 2, (b) lim^^= 1, (c) lim log c x = 1. 

*-►1 X t L *->0 X x->e 


The limits (1.1 1— 2)— ( 1 . 1 1—4) defining a derivative are also of this type. The 
definition of this kind of limit was given in § 1.1 (see 1.1-2). 

(2) The limit of a sequence of numbers: lim„_*o s n . 

As examples we may choose 


(a) lim 


2 n 


n + 1 


= 2 , 


(b) Hm sin(nW2) = 


(c) lim 2 l/n = 1, (d) lim(l+-V = e. 

Here the variable n is discrete , running through the natural numbers 1, 2, 3, ... . 
We have not yet defined this kind of limit formally. 

(3) The type of limit occurring in the definition of an integral: 

lim 2 f(x'i) Ajq = f f(x) dx 

1=1 Ja 

(in the notation of §1.5). Here the variable quantity is an approximating sum; it 
does not depend merely on n, nor does it depend on a single continuous variable 
x. This kind of limit is therefore different from either of the types (!) and (2). 
We shall now discuss these three types of limit in more detail. 


1.61 / LIMITS OF FUNCTIONS OF A CONTINUOUS VARIABLE 

In speaking of lim x ^ XQ f(x) it is usually understood that there is some open 
interval ( a , b) such that a <x 0 <b, and such that f is defined at each point of the 
interval except possibly at x 0 itself. We repeat the definition of a limit from §1.1: 

Definition. The function f has the limit A (a certain real number ) as jc->x 0 
provided that to each positive number e there corresponds some positive number 
8 such that 

|/(x) - A\<e if 0<|jc-x 0 |<a (1.61-1) 

and if f is defined at x. 

In certain cases f may be defined only on one side of x 0 ; then we speak of 
one-sided limits. 

In all these cases we speak of x as a continuous variable , because it is free 
to assume all values on certain intervals of the real-number scale. The adjective 
“continuous” here contrasts with “discrete”. 

The foregoing definition of limit was first used systematically in the foun- 
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dations of calculus by the French mathematician Augustin-Louis Cauchy (1789- 
1857). Prior to Cauchy there had been various attacks on the problem of putting 
the fundamental concepts of the calculus on a sound basis. During the period 
from Newton and Leibniz to the work of Cauchy (roughly 1665-1821) a good 
part of the formal side of calculus and the technique of applying it to physics 
and geometry had been developed, but the reasoning was often hazy and 
dependent upon intuition rather than logic. The work of clarifying the fun- 
damentals and establishing a satisfactory standard of logical rigor did not end 
with Cauchy, of course. A more adequate understanding of the real number 
system had yet to come. 

We have put in this brief reference to mathematical history in order to draw 
a parallel. The student’s understanding of calculus will normally pass through 
stages of development not unlike the historical ones, but with a difference. The 
student need not embrace all or even many of the misconceptions which have 
been nourished about calculus in the long evolution of the subject since the time 
of Newton and Leibniz, provided he will put aside preconceived notions of 
fundamental mathematical concepts and base his understanding on careful study 
of modern definitions and theorems. Intuition and experience must play their 
part in learning, of course. 

The notation for a limit is frequently employed in conjunction with the use 
of the symbols +<», —<». These symbols are associated with the words “infinity” 
and “infinite,” with which the student no doubt already has some familiarity. 
Our present object is to discuss the meanings of such symbolic assertions as 

Iim /(x) = + oo, lim f(x) = — °o. 

These statements are given various verbal renderings. The first one may be put 
in the form “/(x) becomes positively infinite (or approaches plus infinity) as 
x — > x 0 .” For the second statement “positively” is replaced by “negatively,” and 
“plus” by “minus.” The meanings of statements are contained in the following 
definitions: 

Definition. We write Jim x -» Xo /(x) = +°o if to every M >0 there corresponds some 
8 > 0 such that M < f(x) whenever 0 < |x - x 0 | < 8. We write lim x ^ Xo f(x) = — o o if 
to every M > 0 there corresponds some 8 > 0 such that /(x) < — M whenever 
0 < |x - x 0 | < 8. 


The definitions are modified in obvious ways for one-sided approach of x to 

x 0 . 

Example 1. lim x ^ 0 x -2 = + °°. 

To obtain this result from the definition, we consider the inequality 
M < 1/x 2 , where M is any positive number. An equivalent inequality is x 2 < 1/M, 
or |x| <M" 1/2 . Thus we may choose 8 = M~ ,/2 and the assertion stated in 
Example 1 is seen to be true by definition. 
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A possible misconception of the definition may be allayed by the following 
example: 

Example 2 . Consider f(x ) = (1/x 2 ) sin 2 (7r/x) as x -» 0. 

In this case it is not true that \im x ^ Q f(x) = + o°, in spite of the fact that / 
takes on values as large as we please as near 0 as we please. For, considering the 
definition, choose M- 1. The inequality (1/x 2 ) sin 2 (7r/x) > 1 is true for some 
values of x very close to 0, e.g., x = 7, 9 , . . . ; but for certain other values of x 

near 0 it is not true, e.g., x — {, 3, • ■ • , 1 In , .... 

The symbols +00, -00 are also used in connection with the independent 
variable. We use x -» + °° as a symbolic equivalent of the phrase “x tends to plus 
infinity” (or “x becomes positively infinite”). This occurs in such contexts as 
lim x _+oo/(x) = A, which by definition means “to each e > 0 there corresponds an 
M > 0 such that |/(x) - A| < e whenever M < x.” 

Example 3. lim x ^ +oc (2x - l)/(x - 3) = 2. 

To verify this assertion by direct application of the definition, we proceed as 
follows: 

2x — 1 _ __ 2x - 1 - 2x + 6 _ 5 

x — 3 x - 3 x — 3’ 

Therefore, if x > 3 we have 


provided (5/e) < x — 3, or 3 + (5/e) < x. This shows that, for given e > 0, the 
conditions of the definition are fulfilled with M = 3 + (5/e). 

The following theorem states explicitly an important principle which is 
frequently used in mathematical arguments. 

THEOREM X. Let f(x) be defined at all points except x = a of some open 
interval containing that point, and suppose that , as x tends to a, f(x) 
approaches a limit which is positive. Then f(x) is positive when x is 
sufficiently near the point a. 

Proof. Let the limit of /(x) be A. Then, according to the definition, if any 
positive number e is given, there is some corresponding positive number 8 such 
that |/(x) - A| < € when 0 < |x - a\ < 8. Now A > 0, by hypothesis. Suppose that 
we take A/2 for the €. Then there is a certain 8 such that |/(x)-A|<A/2 if 
0 < |x — a | < S. Now [/(x) — A\ < A/2 is equivalent to the double inequality 
-A/2 </(x) ~ A < All , as the student will easily see. Using the left one of these 
last two inequalities, and transposing, we see that -A/2+A</(x) when 0< 
|x - a \ < 8. Certainly then /(x) is positive, for A/2> 0. This completes the proof. 

Example 4. As an illustration of the use of Theorem X, we shall prove the 
following statement: Let f be defined in some open interval containing x = a, and 
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suppose that f'(a ) exists and is positive. Then , for points x u x 2 sufficiently close to 
x = a, x x < a <x 2 implies 

f(x\) <f(a) <f(x 2 ). (1.61-2) 

To prove this assertion, consider the definition 


f’(a) = lim 

X-»0 


/(*)-/(fl) 
x - a 


We are assuming that f'(a) > 0; consequently, by Theorem X, 

/Wzii£l> o 

x - a 


(1.61-3) 


when x is sufficiently close to a. But from (1.61-3) we infer that f(x)>f(a) if 
x > a, and f(x) <f(a) if x < a. Thus the assertion about the inequalities (1.61-2) 
is seen to be true. 


We shall state two further theorems which, like Theorem X, are frequently 
used in the kind of reasoning with limits which occurs regularly in calculus. 

THEOREM XI. Let f be defined at all points except x = a of some open interval 
containing that point . Suppose that there is a number M such that f(x )^ M 
when x is sufficiently near a. Further suppose that lim x ^ a /(x) = A. Then 
A^M. 

THEOREM XII. Let /, g, h be functions of x defined at all points except x = a of 
some open interval containing that point . Suppose that f(x) ^ g(x) ^ h(x), 
and suppose that the limits lim x ^ a /(x), lim x ^ a h(x) exist and are equal. Then 
lim x ^ a g(x) exists also , and all three limits are equal. 

We leave the proofs of these theorems as exercises for the student, but we 
shall use the theorems whenever the need arises. These theorems have analo- 
gues for other kinds of limits (e.g., limits of sequences and the limits defining 
definite integrals). For example, the proof of (1.51-2) employed a principle 
similar to that of Theorem XI, as applied to the integral as the limit of 
approximating sums. One of the standard proofs that lim x _» 0 (sin x)/x = 1 uses the 
principle of Theorem XII, with /(x) = cos x, g(x) = (sin x)/x, h(x) = 1/cos x. 


EXERCISES 

1. Show that lim*-* 0 (cos x)/x 2 = + °°. Suggestion: Note that cos x if |x| ^ ir/3. 
If M >0 is given, choose 8 as the lesser of the numbers ir/3, (2 M)~ u2 , and explain why 
Af < (cos x)/x 2 if |x| < 8. 

2. Show that lim^oO + sin 2 x)/x 2 = + 

3. What is lim x -> 0 (sin 2 x)/x 4 ? Justify your answer. 

4. Suppose f (x) = m >0 and g(x)>0 when x is any point in an interval a<x<b 
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containing x 0 . Also suppose g(x)-^0 as x-+x 0 . Show that 


lim = + oo. 
g(x) 


5. Show that lim x ^ +ao (2 4- sin x)/e * = + oo. 

6. Given e >0, find M so that 


9x + 4 


< e if M < x. 


3x - 1 

What statement about a limit does this prove? 

7. Suppose that, as x-^ + t» (or as x-^ x 0 ), |/(x)| ^ JVf, where M is a constant, and 

that |g(x)| ->+.<». Show that 


V 

8. Write f(x ) = 


4x 2 — x - 1 _ 4 - (1/x) — (1/x 2 ) 
x 2 + 2x — 3 ~ 1 + (2/x) - (3/x 2 )‘ 


Then find lim x ^ +0 o/(x), using the rules for limits of sums and quotients. 


lim. 


9. If P(x) and Q(x) are polynomials of degrees m and n respectively, discuss 

P(x) 


Q00 


according as m > n, m = n, m < n. Show that the results when x -» - are the 


same as when x -> + <», if m ^ n or if m - n is even. 

10. In the following cases discuss the behavior of the given function /(x) as x + 
Does f(x) approach a number as limit? Does f(x ) tend to +°o or to -°°? Does it do none of 
these things? 


(a) sinx; (b) sin(l/x); (c) (sin x)/x; (d) xsinx; (e) x 2 + xsinx; 

(f) x 2 + x 2 cosx; (g) x + x 2 sinx. 

11. If lim + ^ = + oo we say that f'(x ) = + ». 

h— o n 

Likewise we define the meaning of /'(x) = -oo. (a) If /(x) = x l/3 , what is /'(0)? (b) If 

/(x) = x 2/3 , what about /'(0)? What about /!(0) and /_( 0)? (c) If f(x) = 0 when x<0, 
/(0) = 1 and f(x) = 2 when x > 0, show that /'( 0) = + oo. This shows that f'(xo) = + 00 does not 
imply that / is continuous at x 0 . 

12. Prove that the law of the mean is true with the following hypotheses, which are 
weaker than those imposed in § 1.2; / is continuous when a ^ x ^ b, and, for each x of the 
open interval a < x < b, f is either differentiable (i.e., /'(x) exists as a finite limit) or f'00 is +°° 
or -oo as defined in Exercise 11. 


13. Prove Theorem XI and Theorem XII. For Theorem XI begin by supposing M < A, 
and show that this leads to a contradiction. For Theorem XII let /(x) and h(x) approach A as 
limit, and note that, if e > 0, the inequalities A - € < f(x) < A + €, A — e < h(x) < A + € must 
hold when |x - a \ is sufficiently small. 

14. State and prove a theorem similar to Theorem XI in which the inequalities are 
reversed. 


1.62 / LIMITS OF SEQUENCES 

A sequence is an ordered set of numbers in one-to-one correspondence with the 
positive integers. This correspondence may be shown by numbering the terms of 
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the sequence in order: 


■Sb S3, • • • » .... 

The sequence is then denoted symbolically by {s n }. As examples we cite: 

(a) 2,4,6, 8, ... ,2n, ... ; 

(h) 5 , g, . . . , l/(2n), . . . ; 

(c) h, i, T7 , . . . , nl(n 2 + 1), . . . ; 

(d) 1, 1 + 3 , 1 + 3 + 5 , . . . , 1 + i+ * • * + (1 /b)» ■ • • ; 

(e) n = l,2,...). 

A sequence is in fact a particular kind of function, a function whose 
independent variable n ranges over the set of positive integers. We could (and 
ocassionally do) use the functional notation f(n ) for a sequence, but notations 
such as {jc„}, {s n }, {a n } are more common. Observe that {s„} is the symbol for the 
sequence (function) as a whole, whereas s n is the symbol for the nth term (the 
value of the function). 

Sometimes it is convenient to have a notation in which the terms are 
numbered 0, 1, 2, ... instead of 1,2,3,.... For instance, if the sequence is 
1, 2, 4, 8, 16, 32, ... , it is convenient to denote it by s 0 > Si, s 2 , • • • » s„, so that 
s n = 2" (note that s 0 = 2° = 1). 

The definition of the limit of a sequence is quite similar to the definition of the 
limit of a function of a continuous variable, as given in §1.61. Thinking of the 
sequence {s„} as a function, let us compare {s„} with a function /(x) of the 
continuous variable x. The definition of lim n ^oc s n is then very similar to the definition 
of lim x _ +Q0 /(x) (just preceding Example 3 in §1.61). 

Definition . We say that lim„^x s n = A if for each positive e there is some integer 
N depending on e such that |s„ - A\ < e whenever N ^ n. In this case we say that 
the sequence {s„} is convergent and that it has the limit A. 

Example 1. We shall show that lim^d)” = 0. 

Suppose e > 0 is given. We wish to find N so that N ^ n will insure ( 5 )" < e, 
or, what is equivalent, 1/e < (i)\ Now, in the sequence {(i)"}, each term is half 
again as large as its predecessor, and the first term is \. Hence certainly 
( 2 )" >n( 2 ). Thus, to get 1/e <(!)", it is amply sufficient to have 1/e <«(!), or 
2/e < n. Therefore we take N as the first integer which is greater than 2/e, and 
then certainly (|) n < e if N ^ n. This is proof that lim ^ 00 ( 3 )” = 0. 

Example 2. We now show that lim n ^ a3 a"=0 if 0<a<l. This includes 
Example 1 as a special case. 

We can express a in the form 
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This is because 0< a < 1; h is given by 


h = 


1 - a 
a 


We now use (1.2-5), with a = n; 

a~ n = (1 + h) n 1 + nh. 

This implies „ ^ 1 

a = 1 + nh‘ 

Suppose now that e > 0 is given. We wish to find an N such that N ^ n implies 
a n < e. It is sufficient to have 

1 < € or -<l + nh, or - — - < nh . 

1 + nh e e 

Since h >0, some multiple of h will exceed (1 - e)/e. Let N be a positive integer 
such that (1 - e)le < Nh. Then N implies a n <e, and the proof is complete. 

Theorems X, XI, and XII have analogues for sequences. The student can 
easily formulate these analogues alone. The general theorems about sums, 
products, and quotients (see §1.6) also apply to sequences. 

Example 3. Find .. 2n 3 + n 2 — ln 

n™ n 3 + 2n + 2 

We write the general term of the sequence in the form 

2 + (1/n) - (7/n 2 ) 

1 + (21 n 2 ) + (2/rc 3 )' 

By the theorem on sums, 

lim(2 + — — = 2 and limfl +A + "V) = 1- 

n-*oc V n n / n-*°° V n n ) 

By the theorem on quotients, 

.. 2 + (1/n) — (1/n 2 ) _ 2 

1 H- (2/n 2 ) + (2/n 3 ) 1* 

Hence the required limit is 2. 

Many important sequences have a property which is described by the word 
monotonic. A sequence {s„} is called nondecreasing if Si ^ s 2 ^ S 3 = * * * , i-e., if 
s n ^ s n+ i for every positive integer n. It is called strictly increasing (or just 
increasing ) if s } < s 2 < s 3 < * * • . For example, the sequence 

U,2, 2, 3, 3,... 

is nondecreasing. The nth term of this sequence is given by f~ w 1 ^the greatest 
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integer ^ The sequence {5„}, where s n =-~-r, is strictly increasing: 

^ J /til 

12 3 

2 * 3 , 4 , • • • • 

Likewise, we call a sequence nonincreasing if s„ ^ s„+i for every n, and ( strictly ) 
decreasing if s n > 5„+i for every n. A sequence of any one of these four types is 
called monotonic. 

One very important type of nondecreasing sequence is the sequence of 
decimal approximations to a positive number. For example, let s n be the number 
obtained by writing the first n decimal places of the number 3 : 

si = 0.3, s 2 - 0.33, 5 3 = 0.333, 

The general formula for 5„ may be written 

-A + JL+ ,JL 

Sn 10 10 2 10 "' 

Using the formula 

a + ar + ur 2 + • — + ar n ~ l = a ^ _~ r ^ (1.62-1) 

for the sum of a geometric progression, we find 

c L 1 ~~ Go) — Iri sJ- \n-i 

Sn ~ 10 - ^ — 3L 1 tlOl L 

1 — 10 

We see from this that 

lim s n = l (1.62-2) 

n-^co 

When we write 

\ = 0.333..., 

the meaning is exactly that expressed by (1.62-2). 

Similar remarks apply to all nonterminating decimal representations, e.g., 

f = lim 5„, s n = 0.66 ... 6, V3 = lim x„, 

n — ' — — * ‘ rt— 

where Xj = 1.7, x 2 = 1.73, x 3 = 1.732, x 4 = 1.7321, etc. 

Suppose that {s„} is a nondecreasing sequence. There are just two pos- 
sibilities: either (1) there is some number A/ such that s n = M for every n, or (2) 
s n + 00 (this means that no matter what M is chosen, M < s„ for all sufficiently 
large values of n). In case (1) we say that the sequence is bounded above. 
Likewise, for a nonincreasing sequence {s n } there are just two possibilities: 
either (1) the sequence is bounded below , i.e., there is some number M such that 
M ^ s n for every zi, or (2) s n -> - sc as n -> 00 . 

A sequence is called bounded if it is bounded both above and below. 
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Observe that an increasing or nondecreasing sequence is always bounded below , 
since Si^s„ for every n. Likewise, a decreasing or nonincreasing sequence is 
bounded above, since s n ^ S\. 

There is a very important theorem which asserts that a monotonic sequence 
is convergent (i.e., has a limit) provided it is bounded. 

THEOREM XIII. Suppose that either 

(u) s n = i and s n = AT, n 1, 2, . . . , 
or (b)s„+i^s n and M ^ s„, n = 1, 2, . . . , 

where M is a constant. Then lim„^oo s n exists. 


The great importance of this theorem lies in the fact that by using it we can 
be sure that certain sequences are convergent without knowing precisely what 
the limits are. The theorem is proved in Chapter 2 (Theorems III and IV). The 
proof is based upon a careful discussion of the nature of the real number system. 


Example 4. Let s n 


13-5 


(2 n - 1) 


Show that this sequence is mono- 


2 • 4 ■ 6 ■ ■ * (2 n) 
tonic and bounded, and therefore convergent. 

In order to make sure the notation is understood, let us write a few terms of 
the sequence, by substituting successively n~ 1, 2, 3, ... . We have 


Si 


1-3 3 


2 ’ S2 2 


4 8’ 


S3 


1*3-5 


2*4 


6 16’ etC ‘ 


The sequence is decreasing. For, 


$2 ~ 4 Sl, S3 — sSi, S4 — 8S3, 

and in general 

_2n + 1 ^ 

s " +1 In + 2 Sn S "‘ 


All the terms are positive; so that 0<s n ^i Thus the sequence is bounded. 
Hence it must have a limit. This argument does not show what the limit may be. 


Example 5. Consider the sequence defined by 

s = l + ± + ± + ... + -L 

" 1! 2! 3! n!’ 


the first few terms of which are 


_i 1 _3 . 1 . 1 _5 

s, 1, s 2 l+j. 2 2 ’ 53 1 + l-2 + 1-2-3 3' 


This sequence is plainly monotonic, for 


Sn+i S n F 


1_ 

(n + 1)! 


» Sn ^ S n + j. 
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We shall show that the sequence is bounded above. Now, if n > 2, 


n\ 


1 • 2 • 3 • • • n < .2 • 2 • • • 2 
n — 1 



Therefore, if n > 2, 


s n <l + 2 + 22 + 


+ 2 ^‘ 


But, by (1.62-1) taking a = 1, r = we have 

l + 5 + ^ + --+2^T=2[l-(j) n ]<2. 


Therefore s n <2 for all n. Consequently, by Theorem XIII, the sequence {s n } is 
convergent. The theorem does not tell us the exact value of the limit, though of 
course we can see that it is not larger than 2. 

Example 6 . Let {s„} be the sequence 


So 



We shall show that the sequence is increasing and bounded above. First we write 
the binomial expansion 


(1 + a) n = 1 + na + 


n(n 1) 
1-2 


a z + 


n(n 


D(^Z) a > + 


12-3 


+ a\ 


The coefficient of a k (l ^ k ^ n) is 


n(ti - l)(n — 2) • - - (n - k + 1) 
k! 


Putting 1/n in place of a, we have 

The expression on the right has n + 1 terms, a typical one of which is 
Thus 


nr- 


1+1 + 


■ 4 . H)H) 


2 ! 


3! 


+ . . .+ 


K) ■(-+) 


n! 


(1.62-3) 
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Now suppose that n is replaced by n + 1, so as to form the corresponding 
formula for the expression 



i y +1 

n + 1 / 


We see from (1.62-3) that each of the numerators after the first two terms on the 
right in (1.62-3) is increased when n is replaced by n + 1. Moreover, the total 
number of terms on the right is increased from n + 1 to n + 2. Hence it is 
certainly the case that 



1 \ n+1 
n + 1/ 


This shows that our sequence is increasing. Moreover, from (1.62-3) it is clear 
that 



<i + i +^+i+ 



In Example 5, we saw that 


Therefore 





(1.62—4) 


This shows that our sequence is bounded above. It therefore has a limit. The 
limit is denoted by the letter e: 


e = lim 

n-»oo 



(1.62-5) 


This number e is taken as the base in defining natural logarithms, of which we 
assume the student already has a working knowledge. 

The definition of the limit of a sequence enables us (theoretically, at least) to 
decide whether any specified number A is or is not the limit of a given sequence 
{s„}. To make the decision in a given case we must work with inequalities to 
form an estimate of the magnitude of the difference s n - A as n becomes larger 
and larger. In practice we often find the limits of sequences by using the theorem 
on limits of sums, products, and quotients. In this procedure the given sequence 
is expressed in terms of other sequences whose limits we already know. 

There are many cases, however, in which we cannot find the limit of a 
sequence by direct use of the definition or by using the theorem on limits of 
sums, products, and quotients. It is very important to be able to recognize, by 
intrinsic characteristics of the sequence itself, that the sequence is convergent. 
Then we can say with certainty: “By virtue of such and such a characteristic 
possessed by this sequence {s„}, there must exist a number A such that 
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lim^oo s n = A.” One such feature is the property of being monotonic and boun- 
ded. Not all convergent sequences possess this characteristic, of course, but it is 
nevertheless of great importance. 

In higher analysis there are many situations in which we assert the existence 
of something having specified properties. A problem leading to such an assertion 
may be called an existence problem. Many important concepts are introduced via 
existence problems. Definite integrals and functions defined by infinite series are 
examples. 


EXERCISES 

1. Which of the following sequences is convergent? (a) {(-1)"}, (b) 

(c) {n(-l)"}, (d) {n[l -(-!)"]}, (e) {(-l) 2n+1 }> (f) 




2. If s„ = 


3. If s r 


n + n — 1 
3n 2 + 1 
6 n 3 + 2n + 1 


and € > 0, find N so that |s„ - 3 I < e if N ^ n. 


6 — e 


, find A so that if 0 < e < 6, |s„ - A| < e provided n > . What 


n —* 00 


does this show about lim 

10” IO 10 10 

4. If s n = — r, show that s„ g - 7^7 — if n > 10. Hence, if e > 0, find N so that 

n! 10! n 

|s„| <e if N ^ n. From what value of n onward is s„+i < s„? 

C T , 21 2*4 1 2-4*6 1 , 

5. Let Si = j • p, s 2 = YT 3 * p, 5 3 = y -g • p, etc. Write the general expression for 

2 

s n and show that s n . What do you conclude about Iim„^oo s„? Show that s„+i < s„. 

6 . Let P(x) = a 0 x p + a x x p ~' + * • • and Q(x) = b 0 x q + b t x q ~ l -be polynomials, with 

a o b o ^0. Discuss Iim„^co P(n)IQ(ti) according as (a) p < q> (b) p = q, (c) p > q. 
(Compare with Example 3.) 

7. (a) If P(x) is a polynomial of degree r>0, show that Iim„^oo P(n + 1)/P(n) = 
1. (b) What is linw P(2n)/P(n)? 

8 . Find 


lim 


H)' 


-(' 4 )' 


9 . Usi ng rationalization t echniq ues, find the limits of the following sequences: 
(a) {Vn + l-Vn}; (b) {Vn(Vn + 1 — Vn)}. 

x n -x 


10. If lim — , - = 0, show that lim x„ = x. (Write y„ = 

n->°° X„ + X 00 


X n + X 


and solve for x n .) 


11. Let f(x) be defined as /(x ) = lim „ , where x > 0. Find the value of f(x ) for 

n ->» X 1 X 

each positive x. 

/I— x 2 \" 

12. Consider lim ( — — 5) . For what values of x does the limit exist? Classify the 

n-»® \ 1 + X / 

values of x according to the value of the limit. 

13. Let /(x) = x/|x| if xt* 0, and define /(0) = 0. Show that /(*) = 
Iim„_ oe ( 2 / 7 r) tan _, («x). 
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14. Find each of the following limits: 

(a) lim(A + A+ • * • + A); 

,™\n n n / 

/ 1 2 ^2 2 \ 

(b) lim( — H — 3 + ■•■ + — ). 

n — \m n n7 


15. If s„ = 
hence find lim n _ 


1 3 1 

Vn 2 + 1 Vn 2 + 2 


l__ 

Vn^Tn’ 




/ 1\ -1 ' 2 

show that ^1+— J <s n <l, and 


16. (a) Prove that, if C > 1, limn-ooC^” = 1, using the following suggestions. First 
explain why C ,/n > 1. Then let x n = C ,/n - 1 and use (1.2-5) to show that C > 1 + nx n . It 
then follows that lim,™*,, = 0. (Why?) Why then does C 1/n ^> 1? (b) If 0< C < 1, show 
that lining C 1/n = 1 by applying the result (not the method) of (a). 

17. Let a n = n V2n . For n>l write a„ = 1 + x n . Use (1.2-5) to prove that x n ^ 

— - < — J=. Use this to deduce that n 1/n < 1 + — 7 =+ — , and hence that lim,™^'" = 1. 
n Vn Vn n 

18. Let Sn = nlC n , where C>1. Write VC = 1 + x and use (1.2-5) to show that 
Vs n < l/(xVn). What do you conclude about lim,™ s„? 

19. Let s n = (l+ — ) . Show that 1 = — — - (l +- ± , and then use (1.2-5) 

V nj Sn n \ n —\J 

to prove that s„_, > s„. Show that lim,™ s n = e (see (1.62-5)). 


20. (a) Show that lim(l + =Ve. 

(b) Observe that (' + £) = (‘ + ^Tl)( 1 V) 

(c) Find lim i 1 4 - — ) . 

n — <-oo \ fl ) 


. Now find lim ( 1 H — 


imflUV. 

— .00 y nj 


21. Suppose 0<5„ and s„+i < rs„, where r is a constant such that 0 <r<l. Show 
that lim,™ s n = 0. 


22. Suppose 0< s„ and s„+j ^ rs„, where r is a constant such that r > 1. Show that 
lim,,^ s n = + 00. 

23. Suppose C > 1 and s n = C lln . Show that 1 < s„ and s n + 1 < s n (assume the contrary 
in each case, and deduce a contradiction). Hence, by Theorem XIII, lim,™ s n exists. 
Denote the limit by r. Why is r ^ 1? Prove that r = 1 by showing that r > 1 leads to the 
contradiction that 1/C ^0. 


11 1 / 1 \ " 

24. Let a " = 1 + y] + 2l + ' ' ‘ + = 1 1 + — ) * In the course of Example 6 it was 

proved that b n < a„. Let a = lim,™ a n (the existence of the limit follows from Example 
5). By definition, e = lim,™ b n (see (1.62-5)). Show that a = e, using the following 
suggestions: First explain why you know that e ^ a. Then refer to (1.62-3) and explain 
why 


1 + l! + 




2 ! 


3! 


+ • • • + 


< e 


if 1 <p < n. In this result let n^> 00 and obtain a p %e. Now complete the argument as to 
why a = e. 
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1.63 / THE LIMIT DEFINING A DEFINITE INTEGRAL 

Here we have to do with a limit different in kind from the limit of a function of a 
continuous variable and the limit of a sequence. Our discussion will be couched 
in terms of the notation used in the definition of a definite integral in §1.5. Consider 
the approximating sums 


2 f(x\) Ax, (1.63-1) 

1 = 1 

associated with a given fixed function / which is defined on the interval 
a^x^b. Although the index n appears here, these sums do not form a 
sequence. A particular sum depends not merely on n , but on the points of 
subdivision Xj, x 2 , . . . , x n -\ and on the intermediate points xi, . . . , x' n . We may 
say that, the function and the interval being fixed, the approximating sum 
(1.63-1) is a function of n and of the points x u x u . . . , x„_,, xj, . . . , x'„. When we 
say that 

lim 2 /(*!•) A*/ = f fix) dx , (1.63-2) 

i = l Ja 

we mean that to each positive number e corresponds another positive number 3 
such that 


i = l 


f(x'i) &.X-, - 



dx 


<€ 


for all choices of n and the points x„ x'i such that the greatest of the numbers 
Axj, . . . , Ax n is less than 6. 

We give this definition here for comparison with the definitions of the two 
kinds of limits already discussed. The problem of showing that the sums (1.63-1) 
do actually have a limit (when f is a continuous function) is another example of 
an “existence problem” for a real number. (See the two final paragraphs of 
§1.62.) As with most such problems, we cannot obtain a solution without having 
at our command a systematic knowledge of the fundamentals of the real number 
system. 


1.64 / THE THEOREM ON LIMITS 
OF SUMS, PRODUCTS, AND QUOTIENTS 

In § 1.6 we stated three fundamental theorems about limits [see also (l.l-4)-(l.l- 
6)]. The student has used these theorems from the very beginning of his study of 
calculus. We are now going to give formal statement and proof of the pro- 
positions as they apply to limits of functions of a continuous variable. First, 
however, it will be convenient to consider certain rules governing the use of 
absolute values. The absolute value \A\ of a number A was defined in Example 3, 
§1.1. Now for any two numbers A, B, it is always true that 

|A + B|S|A| + |B|. 


(1.64-1) 
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There are four cases to consider: (1) A and B both positive, (2) A and B both 
negative, (3) A = 0 or B = 0, (4) A and B of opposite sign, and neither of them 
equal to 0. In the first three cases it is easily seen that |A + B \ = |A| + |J8|, while in 
case (4) |A + B| < |A| + \B\. Thus (1.64-1) is true in all cases. 

We now turn to the limit theorem. 


THEOREM XIV. Let f and g be defined in an interval containing x = x 0 , but not 
necessarily at the point x = x 0 itself. Suppose that the limits 

lim f(x), lim g(x) 

x->x 0 x->x 0 

exist. Then the sum f(x) + g(x) and the product f(x)g(x) approach limits as x 
tends to x Q , given by 

lim {f(x) + g(x)} = lim fix) + lim g(x), (1.64-2) 

JC-»X 0 X->JC 0 

lim{/(x)g(x)} = {lim /(x)}{lim g(x)}. (1.64-3) 

x->x 0 


Furthermore, if lim g(x) 5 * 0, the quotient 

jc->x 0 


fix) 

gix) 


has a limit given by 


lim 

X-+JC0 


fix) 

gix) 


lim f(x) 

*->•*0 

lim g(x) 




(1.64-4) 


Proof. Let the limits of /(x) and g(x) be denoted by A and B, respectively. 
We shall prove (1.64-2) and (1.64-4), leaving the proof of (1.64-3) as an exercise 
for the student. The prove (1.64-2) we must show that, if a positive e is given, 
we can choose a positive 8 such that 


|(/(x) + g(x)) - (A + B)| < e if 0 < |x - Xo| < S. 


(1.64-5) 


Now 


(fix) + gix)) - (A + B) = (f(x) - A) + (g(x) - B), 
and therefore, by an application of (1.64-1), 

Iff (x) + g(x)) -(ALB) | ^ |/(x) - A| + |g(x) - B |. 


(1.64-6) 


Now by hypothesis we can make |/(x) — A| and |g(x)- B| as small as we like by 
restricting x to lie sufficiently near x 0 . In particular, there are positive numbers 8 , 
and 8 2 such that 


|/(x)-A|<| if 0< |x — x 0 | < S| 


(1.64-7) 


|g(x)-B|<| if 0 < |x — Xo| < S 2 . 


and 


(1.64-8) 
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Let 8 be the smaller of the numbers 8 U S 2 . It then follows from (1.64-6), (1.64-7), 
and (1.64-8) that (1.64-5) holds. This completes the proof of (1.64-2). 

In proving (1.64-4) we first of all observe that, since lim*_* t0 g(x) = B and 
B?± 0, we can be sure that |g(x)| > %\B \ if we require x to be sufficiently near x 0 . It 
suffices to choose 6 0 so that 

|g00 — < 2 |B| if 0 < |x — Xo| < So, 

for then 


B = (B - g(x)) 4- g(x), 

and by (1.64-1) we have 

|B| S |_B - g(x)| + |g(x)| < l|B| + |g(x)|, 

and consequently 

l|Bj < |g(x)j if 0 < |x — x 0 | < So- (1.64-9) 

Now we can write 

fOO A . B/(x)-Ag(x) B[/(x) — A] + A[B — g(xYI 

g(x) B g(x)B g(x)B 

and so 


fW_A |Bj lf(x) - A[ + \A\ \B — g(x)| 
g(x) B “ |g(*)||B| 


(1.64-10) 


(Here we have used the fact that the absolute value of a product is the product 
of the absolute values.) Now let a positive e be assigned arbitrarily. We wish to 
choose a positive 8 so that the left member of (1.64-10) is less than e if 
0 < |x - x 0 | < 8. Let 


' 2(|B| + |A|)- 


Choose 8 1 and 5 2 so that 

|/(x)-A|<€i if 0<|x-x 0 |<6i, (1.64-11) 

|g(x)-B|<e, if 0 <|x-x 0 |<6 2 . (1.64-12) 

At the same time we make sure that 8 2 <8 0 so that (1.64-9) will hold. Now, 
making use of (1.64-9), (1.64-11), and (1.64-12) we see that the right member of 
(1.64-10) is less than 

I B | e i + |A|e, _ 

m 

if 0<|x-x 0 |<6, where 8 is the smaller of the numbers 8 1 , 8 2 . This completes 
the proof of (1.64-4). 
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EXERCISES 

1. Prove part (1.64-3), using the following suggestions: Show that |g(x)|<|B| + 1 if 
|g(x)- B| < 1. Write 

f(x)g(x) - AB = (f(x) - A)g(x) + (g(x) - B)A , 
and use (1.64-1). If e >0 is given, let 

£, = |A| + |B|+r 

Now show that |/(x)g(x) — AB\ < e provided |/(x) — A\< e u |g(x) — B| < 1 and |g(x) — B\ < 
€\. Explain how these last inequalities are to be guaranteed by appropriate restrictions on 
x. Write out the entire proof in full detail. 

2. Formulate and prove the counterpart of Theorem XIV for limits of sequences. 


MISCELLANEOUS EXERCISES 

1. Suppose that f is continuous on [a, b], that f'(x) and f"(x) exist when a < x < b, 
that /(a) = f(b) = 0, and that there is a number c such that a <c <b and /(c) >0. Prove 
that there is a number £ between a and b such that /"(£)<0. 

2. Let /(x) = log fl x (0<fl, aA \). Deduce that / / (jc) = ^^, using nothing about 

logarithms other than: (1) the assumption that f’(x) exists when x>0, and (2) the 
property log a (xu) = log a x + log a u when x > 0 and u > 0. 

3. Find lim {V(x + a)(x + b) - x}. 

X— »°° 


4. In each of the following cases, investigate the one-sided limits at the point 
indicated, and decide whether or not lim x ^x 0 /Qc) exists. 

(a) f(x) = [x] + [3 - x], x 0 = 2; 

tan x i 

(b) f(x) = e tnnx+y *0 = y J 

(c) f(x) - 2 -,,x sin xo = 0. 

5. Let Si = V2, s n+ i = V2 s„, n = 1,2,...; find lim rt ^cc s„. 

6. Suppose 0 < ai ^ a 2 = • ■ ‘ = a k , where the a’s are fixed numbers. Let b n = 
(a? + fl 2 + • • • + a!!) 1/n . Show that lim n ^oo b n = a k . 

n — 1 

7. (a) Show that lim X " 2 , 2 = -j- 

n-*<x p =0 fl ' P 4 


^Bring in consideration of 


dx \ 
1 + x 2 / 


(b) Find lim - 2 sin where a > 0. 

n — n p = l H 

8. In (1.2-4) take f(x) = x 3 , 0. Obtain a formula connecting a, 6 , and h , and show 

that |0 — jl Hence conclude that lim 6= i 

3)a| h — *o 

9. Let (x 0 , y 0 ) be a point of the ellipse b 2 x 2 + a 2 y 2 = a 2 b 2 . Let the tangent to the 
ellipse at (x 0 , yo) intersect the x-axis at A and the y-axis at B. Find the minimum possible 
value of the distance AB. 
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10. Define /(x) = x tan ’(l/x) if x^ 0, and /(0) = 0. Is / continuous at jc = 0? Is it 
differentiable there? 

11 . Show, using the law of the mean, that 

I* log *1 < I* log{| + { 

if 0 < x < f. From this deduce that lim x ^o+ x log x = 0. Specify carefully how to choose 8 so 
that 0 < x < 8 will imply fx log jc | < e, where e is given in advance. Begin by choosing £ in 
terms of e in a suitable manner. 

12. Suppose f(x) = x 2 + 2x, and define g by g(f ) = f 2 sin(7r/f ) + t if t ^ 0, g(0) = 0. Let 
F(t) = f(g(t)). Find F'(0). 

13. Find the absolute maximum and minimum of f(x) = x 2 (l - jc) 3 on the interval 
- 1 ^ x ^ 2 without using a graph . 

14. Consider the functions 


fo(x) = COS JC - 1 
/i(jc) = sin jc - x 
fi(x) = cos jc - 1 +2JC 2 
fi(x) = sin x - x + gx 3 

/ 4 (x) = COS X ~ 1 + 5X 2 - 2J4JC 4 

fs(x) = sin jc - x + gjc 3 - mx 5 . 

Note that f\(x) = / 0 (x) and that f 0 (x) ^ 0 if x > 0. Thus /i(jc) decreases as jc increases, 
and since /i(0) = 0, we conclude that /i(jc) < 0 when jc > 0. Next note that /£(*) = - /i(jc). 
Explain how you conclude that f 2 ( x) > 0 when x > 0. Continuing in this way, show that 
for x > 0, 

x - gjc 3 < sin x < x - gjc 3 4- noJc 5 

and 1 - 5jc 2 T ijc 4 - T5oX 6 < cos x < 1 - {x 2 + 24X 4 . 


What is the generalization? 


15. Suppose that / satisfies the hypotheses: / is defined and continuous when 
a^x < b, differentiable when a < x < b, f(a) = 0, and f(x ) >0 if a <x < b . Prove that 

f’( X ) 

there cannot be a positive constant M such that 0 g y^ -y g M when a <x < b. 

16. Let {a„} be a sequence and suppose that {cr„} is the sequence whose nth term is 


the arithmetic mean of the first n terms of {a„}, i.e. cr„ = (1 In) ^ a k • Prove that if {u„} 

k = 1 

converges, then {o-„} converges and has the same limit. Construct an example which 
shows that {o-„}7 may converge even when {a47 d° es not - 
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NUMBER SYSTEM 


2 / NUMBERS 

Our experience with numbers begins with the positive integers (also called the 
whole numbers, or natural numbers). Next we become acquainted with zero, and 
in due course we become familiar with negative integers and with rational 
fractions (ratios of positive and negative integers). At some stage we learn the 
adjective irrational for numbers such as V2, ^5, tt. In algebra we meet the 
equation x 2 = -1 and are told that it gives rise to a new number L Numbers such 
as i, 2 i, — 7i are called pure imaginary , and numbers such as 3 + 5 i are called 
complex. Our learning, with numbers as with everything else, proceeds mainly 
by particular cases and illustrative examples. But in due time it is possible to 
reduce our knowledge to order and to give it logical coherence by a systematic 
study of number and number systems. We make a beginning on such a sys- 
tematic study in this chapter. 

The integers, the rational fractions, and the irrational numbers compose the 
number system which lies at the foundation of calculus and of all analysis. The 
numbers of this system are called the real numbers, and the system of all such 
numbers is called the real number system. 

The adjectives “real” and “imaginary,” as applied to numbers, have entirely 
conventional technical meanings. They are not meant to convey, and the student 
should not let them convey, any implications whatsoever of a philosophical 
nature about existence or nonexistence as genuine entities. 

We shall not attempt to define the concept of number or to build up the 
concepts of rational and irrational numbers from the concept of the integers. 
Instead, we shall take the real number system as something known (though 
somewhat imperfectly and unsystematically) by the student, and shall make 
specific the algebraic laws governing the real numbers. The student is already 
familiar with most of these laws, but it is now very important to know a 
complete set of properties of the real number system. From such a complete set 
we may deduce every property of the system. 

2.1 / THE FIELD OF REAL NUMBERS 

Addition and multiplication are the fundamental operations with ordinary num- 
bers. These operations conform to the following laws: 

Addition Multiplication 

a + b = b + a ab = ba 

a + (b + c) = (a + b) + c ^ a(bc) = (ab)c 
a(b + c) = ab + ac 


The commutative law: 
The associative law : 
The distributive law: 
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Throughout this section the word “number” will be understood to mean 
“real number,” and symbols a, b, c, x, . . . will stand for real numbers. 

There are two numbers with special properties, namely 0 and 1. The special 
properties are expressed by the laws 

a + 0=a and a ■ 1 = a (2.1-1) 

for every number a. 

The number 0 is special for addition, while 1 is special for multiplication. 
The operations of subtraction and division may be defined with the aid of these 
special numbers in the following way: To every number a corresponds its 
negative, -a, which is the “additive inverse” of a. By this we mean that x = —a 
satisfies the equation 

a+x = 0. (2.1-2) 

Likewise every number a except 0 has a multiplicative inverse, denoted by a~\ 
That is, if a ^ 0, x = a~ l satisfies the equation 

ax = 1. (2.1-3) 

We then define the subtraction of b from a by the equation 

a-b = a + (-b). (2.1-4) 

Similarly, we define the division of a by b as 

f =a(iT'). (2.1-5) 

The properties of the real numbers which we have just been discussing are 
summed up briefly in the language of modern algebra by saying that the real 
numbers form a field. The word “field” here has a special technical meaning. 
When we say that a system of numbers F constitutes a field we mean the 
following: 

1. If a and b are in F, then a + b and ab are in F. 

2. The commutative , associative, and distributive laws hold. 

3. F contains distinct special numbers 0 and 1 with the properties (2.1-1). 

4. Equation (2.1-2) has a solution in F for each a, and (2.1-3) has a solution in F for 
each a ?*0. 

In abstract algebra it is shown how the other familar laws of elementary 
algebra are deducible from the laws governing a field. Among the important rules 
that can be proved are: a • 0 = 0 and (-a)(-b) = ab. We shall not undertake any 
systematic deductions of this kind. We mention, however, the rule: 

If ab = 0 and b ^ 0, then a = 0. (2. 1-6) 

This is proved as follows: Since bi^ 0, there is a number b 1 such that bb~ l = 1. 
From ab = 0 we conclude that a(hh _l ) = 0 • b \ or a • 1 = 0, or a = 0. 

Note that the system of integers (positive, negative, and zero) is not a field, 
although it fails to be one only through the fact that a~ l need not be an integer 
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when a is. The rational numbers do form a field. Thus there are fields other than 
the field of real numbers. 

2.2 / INEQUALITIES. ABSOLUTE VALUE 

One of the important properties of the real numbers is that they are ordered ; that 
is, there is a notion “a is less than b” expressed by the inequality a < b; of any 
two numbers a, b one and only one of the following three things is true: 

a < b or a-b or b < a. 

The properties of order can be stated simply in terms of properties of positive 
numbers. We express the fact that a is positive by the symbols 0 < a. The basic 
laws governing the positive numbers are three in number: 

If 0 < a and 0 < b, then 0 < a + b. (2.2-1) 

If 0 < a and b, then 0 < ab. (2.2-2) 

For each a, one and only one of the following relations is true: 

0<a orO=a orO<-a. (2.2-3) 

In terms of positivity we lay down the following definitions: 

If b — a is positive , we say that a is less than b and write a < b. Under the same 
conditions we also say that b is greater than a and write b > a. 

Other properties of order, or rules for manipulating inequalities, are deduci- 
ble from the above definitions and the three basic laws. Among the important 
rules are the following: 


0<a 2 if a^O. 

(2.2-4) 

If a < b and b < c, then a < c. 

(2.2-5) 

If a < b, then for any c, a + c < b + c. 

(2.2-6) 

If a < b and 0 < c, then ac < be. 

(2.2-7) 


A field is said to be ordered if certain of its members are distinguished by 
calling them positive and if this notion of “being positive” satisfies the laws 
(2.2-1), (2.2-2), and (2.2-3). 

For many purposes it is convenient to introduce the symbolism a ^ b, 
meaning that either a < b or a = b. The symbolism b ^ a has the same meaning. 

The notion of the absolute value of a real number is defined as follows (the 
absolute value of a is denoted by |a|): 

| a | = a if 0 < a ; |a| = -aifa<0; |0| = 0. 

In actual calculations with absolute values we rely largely upon the two rules: 

|ab| = |a| |b|, (2.2-8) 

|a + b|=§|a| + |b|. (2.2-9) 

(See §1.64 for a brief discussion of (2.2-9)). 
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The further rule 


\a\-\b\ 


S|a-h| 


( 2 . 2 - 10 ) 


is also convenient; it is deducible with the aid of (2.2-9). 

We observe in passing that the inequality |x| ^ € (where e >0) is equivalent 
to the inequalities x ^ e and -e ^ x, which we write as a double inequality 
-e^kx^e. By setting x = b~a and doing a little transposing we see that 
| a - b\ ^ e is equivalent to b — e^a^b + e. 


EXERCISES 

1. Prove (2.2-4). Note first that a 2 = a ■ a and a 2 = (~a) • (-a). Then use (2.2-2) and 
(2.2-3). 

2. Prove (2.2-5), Appeal to (2.2-1) and the fact that x < y is equivalent to 0 < y -x. 

3. Prove (2.2-6). 

4. Prove (2.2-7). 

5. Write a = (a - b) + b and apply the rule (2.2-9) to obtain |a| ^ \a - b \ + |b|, whence 
|a| - |b| ~ \a - b |. Why is it also true that |b| - |a| ^ |a - b|? Explain now why (2.2-10) is 
correct. 


2.3 / THE PRINCIPLE OF MATHEMATICAL INDUCTION 

The natural numbers 1 , 2, 3, . ... occupy a position of especial importance in the 
field of real numbers. To single out the class of all natural numbers from the rest 
of the real numbers we may make the following assertion: The totality of natural 
numbers forms the smallest class of real numbers possessing the following 
two properties: 

1. The number 1 is a member of the class. 

2. If x is a member of the class, so is x + 1. 

The statement that the natural numbers form the smallest class having these 
properties means that any collection of real numbers having these two properties 
must include all the natural numbers. 

The characteristic feature of the totality of natural numbers, as just des- 
cribed, is logically equivalent to the principle of mathematical induction (also 
called complete induction). We formulate the principle as follows: 

Let A(ti) denote a proposition ( e.g ., a verbal statement , or a formula ) 
associated with the natural number n . Suppose it is possible to show that the 
proposition is true if n = 1, and suppose also thaU for each particular n, we can 
prove the truth of A(n + 1) if we assume the truth of A(n). Then A(n) is true for every 
natural number n. 

To see the validity of the principle, let S be the class of natural numbers n 
such that A(n) is true. The assumption that A(l) is true means that 1 belongs to 
S. Also, the fact that assumption of the truth of A(n) allows us to deduce the 
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truth of A(n + 1) means that if n is in S, so is n + 1. Hence, by the remarks made 
in the first paragraph of this section, S must contain all the natural numbers. 
That is, A(n) must be true for every n. 

We shall not analyze in detail the logical position of the principle of 
mathematical induction in the description of the natural numbers. If the real 
numbers are thought of as having been built up from the numbers 1, 2, 3, . . ., we 
need the principle of mathematical induction as an axiom about the natural 
numbers. If, on the other hand, we start from the assumption that the real 
numbers are somehow presented to us as a particular sort of ordered field, we 
may define the natural numbers as the smallest class of real numbers which 
contains the number 1 , and which contains x + 1 if it contains x. The theory of 
classes or sets, into which we prefer not to venture at this stage, permits us to 
prove that there is such a smallest class. The principle of mathematical induction 
is, from this latter point of view, a theorem about the class of natural numbers. 

By way of illustrating the principle of mathematical induction, we shall 
prove formally some things about natural numbers. 

Example 1. Every natural number is positive. 

In the first place, 0 < 1. This follows from (2.2-4), since 1^0 and 1 = l 2 . To 
complete the proof of the general assertion, suppose 0 < n for a particular n. 
Then 0 < n and 0 < 1 imply 0< n + 1, by (2.2-1). Thus 0< n for every natural 
number n, by the induction principle. 

From now on we shall usually refer to “the natural numbers” as “the 
positive integers.” The unmodified term “integers” refers to the class consisting 
of the positive integers 1, 2, 3,..., their negatives —1, —2, -3,..., and the 
number 0. 

We shall take for granted without formal proof such familiar facts as that 
there is no integer between n and n + 1, if n is an integer. 

Example 2. If S is a class of positive integers containing at least one 
member, it contains a smallest number. 

Most students will feel inclined to accept this as true without demonstration. 
A proof is logically necessary, however. Observe that the assertion ceases to be 
true if the word “integers” is replaced by “real numbers”; for instance, there is 
no smallest member of the class of positive rational numbers. 

The proof of the assertion in Example 2 uses the characteristic properties of 
the class of natural numbers in an interesting way. Observe, in the first place, 
that if 1 is in S, 1 is the smallest member of S, since there is no positive integer 
less than 1. Hence, we need consider only the case in which 1 does not belong to 
S. We now let T be the class of positive integers p such that p < n for every n 
in S. This class T contains 1 since 1 is not in S. On the other hand, T does not 
contain all positive integers, since some positive integers belong to S. Hence, it 
cannot be that T contains p + 1 whenever it contains p, for in that case T would 
contain all positive integers, by the characteristic properties of the class of all 
such integers. There must therefore be some integer p 0 in T such that p 0 + 1 is 
not in T. Then, by the way in which T is defined, there must be an integer n 0 in S 
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such that n 0 ^p 0 + 1* We assert that n 0 is the smallest integer in S. For, if n is 
any member of S, we have p 0 < n. Let m = n - p 0 , or p 0 + m = n. Here m is a 
positive integer. Therefore, n 0 ^ po + 1 = Po + m = n, or n 0 = «• This completes 
the argument. 

EXERCISES 

1. Prove by induction that, for every natural number n, either 1 = n or 1 < n. 

2 . Prove the validity of the following form of the principle of mathematical in- 
duction, resting your argument on the form enunciated in the text. Let B(n ) denote a 
proposition associated with the integer n . Suppose B(n ) is known (or can be shown ) to be 
true when n = n 0 , and suppose the truth of B(n + 1) can be deduced if the truth of B(n) is 
assumed. Then B(n ) is true for every integer n such that n 0 ^n. 

Suggestion: Let A(n) be the proposition B(n 0 + n - 1). 

2.4 / THE AXIOM OF CONTINUITY 

The facts expressed in the statement that the real numbers form an ordered field 
are quite familiar. We are now going to discuss a much less familiar property of 
the real number system. Most students beginning a course in advanced calculus 
will have had no experience in making use of this property, and quite possibly 
may never have heard of it. We call it the axiom of continuity . 

The Axiom of Continuity . Suppose that all real numbers are separated into 
two collections, which we denote by L and R, in such a way that 

1. every number is either in L or in R. 

2. each collection contains at least one number. 

3. if a is in L and b is in R, then a < b. 

Then there is a number c such that all numbers less than c are in L and all 
numbers greater than c are in R. (The number c itself may belong either to L or to 
R, depending on the particular way in which L and R are formed .) 

It is convenient to have a name for a separation of all real numbers into 
collections L and R meeting the specifications (l)-(3). We call such a separation 
a cut; the number c is then called the cut number. The cut number correspond- 
ing to a particular cut is unique. For suppose a given cut has the distinct cut 
numbers c x and c 2 . One of them is the greater, say c x < c 2 . Consider the number 


which lies halfway between c x and c 2 : C\ < b < c 2 . Now c x <b implies that b is in 
R, by one of the properties of the cut number c x . Likewise b < c 2 implies that b 
is in L. Hence b is in both L and R. This is impossible, however, for by the 
specification (3) L and R cannot have any members in common. The assumption 
of distinct cut numbers has led to a contradiction. Therefore, we conclude that 
any cut has but one cut number. 
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It is clear that if c belongs to L then L consists of all numbers x such that 
x ^ c, while R consists of all numbers x such that c < x. If c belongs to R, then 
L consists of all x such that x < c and R consists of all x such that c Sx. 

The idea of a cut was originated by the German mathematician Dedekind for 
use in his theory of the structure of the real number system (published in 1872 
under the title Continuity and Irrational Numbers). If the number system is built 
up by stages from the integers, it is possible to arrange the exposition in such a 
way that what we have called the axiom of continuity is provable as a theorem 
about the real numbers. But since we are taking the point of view that the real 
numbers are somehow given to us to work with, we are listing our assumptions 
about them. The axiom of continuity is one of our assumptions. It is a 
far-reaching assumption, for with it we can deal satisfactorily with the existence 
problems for real numbers referred to near the end of each of §§1.62 and 1.63. 

No further assumptions need be made about the real number system. The 
system is described with logical completeness by saying that it is an ordered field 
satisfying the axiom of continuity. 

At this point it is convenient to introduce a theorem which expresses what is 
called the Archimedean law of real numbers. Its proof is a good illustration of 
arguments using the axiom of continuity. 

THEOREM I. Let a and b be positive real numbers. There exists a positive 

integer n such that b < na. 

Proof. Suppose the theorem false, so that na ^ b for every positive integer 
n. We shall define a cut as follows: Let L consist of all numbers x such that 
x <na for some n , and let R consist of all numbers not in L, i.e., all numbers y 
such that na ^ y for every n. We must verify that the three specifications for a cut 
are fulfilled: 

1. Every number is in either L or R, by definition. 

2. b is in jR by our initial supposition, and a is in L, since a < na if n = 2. Thus 
neither L nor R is without members. 

3. If x is in L and y is in jR, we have x < na^y for some n, and hence x < y. 

Now let c be the cut number. We observe that all the numbers na are in L, 
since na < (n + l)a. Therefore na ^ c, for c < na would mean that na is in R , by 
one of the properties of a cut number. Hence, also (n + l)a ^ c. But this implies 
na ^ c — a. This being true for all n, we conclude that c - a is in R. But 
c - a < c, and hence c — a is in L. We have now reached a contradiction, and the 
proof is complete. 

2.5 / RATIONAL AND IRRATIONAL NUMBERS 

Numbers of the form p/q, where p and q are integers, are called rational. Real 
numbers which are not rational are called irrational. The theory of the nature of 
irrational numbers began with the ancient Greeks. It has been known since the 
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time of Pythagoras that V2 is not a rational number. (With a little knowledge 
about expressing integers as products of their prime factors the student may 
easily prove this fact for himself.) The theory of incommensurable ratios, 
developed by Eudoxus, is in essence a geometrical treatment of irrational 
numbers. It was not, however, until the nineteenth century that mathematicians 
(Dedekind with his cuts important among them) arrived at the understanding we 
have today of the real number system and of the position of irrationals in it. 

It can be shown that between any two real numbers there are both rational 
and irrational numbers. If a and b are given real numbers with a <b, we obtain 
a rational number r such that a <r <b by the following argument: since 0 < 1 
and b - a > 0 there exists (by Theorem I, §2.4) a positive integer n such that 
1 < n(b - a). Let m be the smallest integer such that m > na. Then (m - 1) ^ na , 
and therefore 


m ^ na + 1 < na + n(b - a) = nb. 

Consequently ^ m ^ , 

a< — <b, 

n 

so that r = min is a rational number of the required sort. Finding an irrational 
number between a and b is left as an exercise for the student (see Exercise 2). 

It follows from the previous paragraph that if x is a real number, rational or 
irrational, we can find a rational number r as near it as we please. That is, if e is 
any positive number, we can find r so that x-e<r<x + e. We do in fact, in any 
actual computations or measurements, use rational approximations to real 
numbers. For many purposes we even limit ourselves to special kinds of rational 
numbers, namely decimal fractions. A decimal fraction is simply a rational 
number of the form m/10", where m and n are integers and n ^0. There is a 
number of this form between any two real numbers, as may be readily shown 
from the fact that if a < b, then 1 < 10" (b - a) for a suitably chosen n. 

EXERCISES 

1. Assume that V2 = m/n, where m and n are integers and the fraction is reduced^ to 
its lowest terms. Then m 2 = In 2 . Now deduce a contradiction, and hence prove that V2 is 
irrational. Use the fact that if the product of two integers is even, then at least one of 
them is even. 

2. Show that if a < b there is an irrational number x between a and b. 
Suggestion: Let y be a positive irrational number, e.g., V2. Then show that there are 
integers m, n with m# 0 and n>0, such that a<(mln)y<b. Why is x = (mln)y 
irrational? 

3. Prove by induction that n < 10" if n is a positive integer. Then prove that if a < b, 
there are integers m, n, with n > 0, such that a < ml 10" < b. 


2.6 / THE AXIS OF REALS 

It is customary and convenient to use geometric language a good deal in 
speaking about the number system. On a given straight line we take an arbitrary 
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point as an origin, an arbitrary direction as positive, and -h — t — i — i — i — h- 

o i q i 9 q 

an arbitrary unit of length. We then mark off segments 

of unit length on either side of the origin, thus obtaining Fig. 19. 

the points which we label as shown in Fig. 19. There 

is a one-to-one correspondence between the real numbers 

and the points on the line. This entitles us, for brevity, to speak of “the point a” 

instead of “the point corresponding to the number a.” The inequality a < b has the 

geometrical interpretation that b lies in the positive direction from a along the line. 

We call the line, thus regarded as a geometrical representation of the real 
number system, the axis of reals , or the real number scale. 


2.7 / LEAST UPPER BOUNDS 

By a set of real numbers we mean an aggregate or class of numbers. It may be 
formed according to any rule, and the number of its members may be finite or 
infinite. If the conditions laid down for determining the set are such that no 
number satisfies them, the set is said to be empty. It is very convenient to have a 
brief symbolism to indicate that a number belongs to a given set. The statement 
that the number s belongs to the set S is expressed symbolically in the form 
s E S (read s is a member of S). The symbolic form of the statement that s does 
not belong to S is sf£S. Thus, if S is the set of prime positive integers, 3GS 
and 8 £ S. 

If S is a set of numbers, and if M is a number such that s ^ M for each 
s G S, we say that Af is an upper bound of S. Evidently any number larger than 
M is also an upper bound of S. If A is an upper bound of S and if there is no 
number smaller than A which is also an upper bound for S, we call A the least 
upper bound of S. Obviously a set cannot have more than one least upper 
bound. 

Example . The set S of numbers of the form n/(n + 1), for all positive integers 
n, consists of 1/2, 2/3, 3/4, 4/5 .. . etc. Evidently 1 is an upper bound of S. But 
more is true; 1 is the (unique) least upper bound of S. To verify this we must 
show that if c < 1, c cannot be an upper bound of S, i.e., that there is some n 
such that c < nl(n + 1). To get such an n we appeal to Theorem I in §2.4, which 
tells us that there exists an n such that l<n(l-c). But c < 1, and so c < 
n( 1 - c) = n - nc, or nc + c < n. But then ( n + l)c < n, and so c < n/(n + 1). This 
completes the argument. 

The following theorem is of fundamental importance: 

THEOREM II. If S is a set of real numbers which is not empty and which has an 

upper bound , then it has a least upper bound. 

Proof. We appeal to the axiom of continuity. Let L be the set of all numbers 
x such that x < s f or some 5 in S, and let R be the set of all numbers y such that 
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s ^ y for every s in S. Clearly L and R together comprise all real numbers. If 
s E S, then s-lEL;AER if A is an upper bound of S. Thus neither L nor R 
is empty. If x E L and y E R, we have x < s for some s in S. But s^y, and 
therefore x < y. We have therefore defined a cut. Let c be the cut number. We 
shall prove that c is the least upper bound of S. It certainly is an upper bound. 
For if we suppose c < s for some 5 in S, we can choose a number z between c 
and s . Then z E JR since c <z, and z E L since z < s; thus we have a contradic- 
tion, for a number cannot belong to both L and R. If b is any number smaller 
than c, the properties of the cut number insure that b E L, and hence b < s for 
some s in S. Thus b cannot be an upper bound of S. The proof that c is the least 
upper bound of S is now complete. 

Theorem II expresses a property of the real number system which is a direct 
consequence of the axiom of continuity. It is easily demonstrated that if the 
statement of Theorem II is taken as an axiom concerning the real numbers, the 
truth of the axiom of continuity may be deduced (making it a theorem instead of 
an axiom). For a key to this demonstration see Exercise 1. Thus the axiom of 
continuity and the existence of least upper bounds as stated in Theorem II are 
equivalent propositions. Hereafter, in arguments where we could lean equally 
well either on the axiom of continuity or on Theorem II, we shall usually appeal 
to the latter. 

As an immediate application we shall prove Theorem XIII of Chapter 
1 (§1.62). We reword it slightly. 

THEOREM III. Let {x n } be a sequence such that x, ^ x 2 ^ ^ x n ^ x n+ i ^ 

and suppose that the set of numbers x„ has an upper bound: x n ^ M for every 
n. Then the sequence is convergent , its limit being the least upper bound of 
the numbers x n . 

Proof. Let A be the least upper bound of the numbers x n . Then if e > 0 we 
have A - e < x n for some n, say n = N> and x„ ^ A for every n. Since x N ^ x n for 
every n^N (by virtue of the assumption that x„ ^ x n+x ), we see that A - e < 
x n ^ A if N^n. Thus by definition lim n -^o x n = A. This proves the theorem. 

The notion of lower bound of a set, and of the greatest lower bound , are 
defined in exactly the same way as upper bound and least upper bound, except 
that the notions of “less than” and “least” are replaced throughout by “greater 
than” and “greatest.” We may summarize the defining properties of the least 
upper bound and greatest lower bound as follows: 

The set S has the least upper bound A if s ^ A for every s in S and if, e 
being any positive number, A — e < s for at least one s in S. 

The set S has the greatest lower bound B if B ^ s for every s in S and, e 
being any positive number, s < B + e for at least one s in S. 

THEOREM IV. If S is a set of real numbers which is not empty and which has a 
lower bound , then it has a greatest lower bound. 
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Proof. We could give a proof similar to that of Theorem II. Instead, we base 
the proof on Theorem II itself. Define a set T as consisting of all numbers 
t = —s, where s E S. If Atf is a lower bound of S, we have M ^ s, or -s ^ -M for 
each 5 in S. Thus T has an upper bound — M. By Theorem II, T has a least upper 
bound, say A. We leave it for the student to show that -A is the greatest lower 
bound of S, thus completing the proof. 

Just as Theorem IV matches Theorem II, so there is a theorem which 
matches Theorem III. 

THEOREM V. Let {*„} be a sequence such that * * * (in general x n ^ 

x„+i), and suppose that the set of numbers x n has a lower bound: x n ^M for 

every n. Then the sequence is convergent , its limit being the greatest lower 

bound of the numbers x„. 

We leave it for the student to base a proof of this theorem on Theorem IV or 
to deduce the proof from Theorem III by considering the sequence {y„} = {-*„}. 

EXERCISES 

1. Let L and R be sets of real numbers satisfying the conditions of the axiom of 
continuity. Assuming the truth of Theorem II, show that the set L has a least upper 
bound c and that this least upper bound has the properties asset ted for the cut number in 
the axiom of continuity. 

2. Prove Theorem IV by an argument similar to that used in proving Theorem II, 
using the axiom of continuity. 

3. Carry out the proof of Theorem V in each of two ways, as suggested in the text. 

4. Prove Theorem I (§2.4), starting as follows: Suppose the assertion in the theorem 
false, so that nafkb for all positive integers n. Use Theorem II and the defining property 
of a least upper bound to arrive at a contradiction. 

5. Let x ( = V2, x«+i = V2 4- x n . Use mathematical induction to show that x r ,<x n+l . 
Next show that if x„ ^ 2 (where n > 1), then x rt -i ^ 2 also. How do you conclude from this 
that x„ <2 for all n? State why lim x _ocX„ exists, and find the limit. 

6 ! _Suppose c>0, and let x ( = Vc, Xn+i = Vc + x„. Show that x„<x„+i and x„ < 
1 + Vc (see Exercise 5). State why lim*^* x„ exists, and find the limit. 


2.8 / NESTED INTERVALS 

The theorem which we are going to discuss in this section is an immediate 
consequence of Theorems III and V of §2.7. We are going to use the language of 
geometry rather than the language of numbers; the theorem is about closed 
intervals on the real axis. It will be convenient to denote intervals by single 
letters, such as I, I u I 2 , and so on. If I, and J 2 are closed intervals, and if the end 
points of I 2 lie in I u we say that I 2 is contained in I h 

Now suppose that we are given a sequence of closed intervals, 7), I 2 , I 3 , . . . , 
I n , . . . with the property that I 2 is contained in I u h is contained in I 2 , and so on; 
that is, we assume that each interval contains the one which follows it in the 
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sequence. Let us denote the length of the interval I n by the symbol l n , and let us 
assume that lim n ^oo l n = 0. In these circumstances we shall say that {/„} is a 
sequence of nested intervals, or that the sequence forms a nest. 

THEOREM VI. If {I rt } is a nest of closed intervals , there is exactly one point 
which is common to all the intervals. 


Proof. Let the interval I n be described by the inequalities a n ^ jc ^ b n , 
( a n < b n ). The fact that {J„} is a nest is then described as follows. The 
inequalities 


a n = ^n+b b n + 1 = b n , n 1,2,... 
correspond to the fact that I n contains J n+1 . Also 

(2.8-1) 

lim (b n - a„) = 0, 

(2.8-2) 




since l n = b n - a n . Now a n <b n ^ b u and a { ^ a n < b n . It now follows by Theorem 
III of §2.7 that the sequence {a n } has a limit; likewise, by Theorem V, {b n } has a 
limit. Let us write 


lim a n = a , lim b n = b. 

n-»oo n-»nc 

These two limits must coincide, that is a = b, by virtue of 
(2.8-2). Now a„ ^ a — h, for a is the least upper bound of — 
the sequence {a„}; likewise a = b ^b n . Hence the point a — — 

belongs to every one of the intervals Obviously there — 

cannot be more than one such point common to all the , 

intervals. For if two such points existed, say at a positive 

distance h apart, we could choose n so large that - - 

l n = b n - a n < h , and under this condition the two points 
could not both lie in I n . Fig . 20. 

Theorem VI is illustrated in Fig. 20. Uses of 
Theorem VI will appear in the next chapter. 

MISCELLANEOUS EXERCISES 

1 . Suppose f is defined as follows: /(x) = 1 if x is a rational number, f(x) = 0 if x is 
an irrational number. For what values of x (if any) is / continuous? 

n, n 

2. Show that lim — 7 -^ exists without finding the limit. 

n^ooflle 

3. Show that lim - (~ 2 . 3 4 5 - ’ 2 '1 2 7L L I ex j sts without finding the limit. 

" —n 1 1 - 3 - • • (2n - 1)) 

4. If x n — (log n)/n, show that x„>x„ + , when n ^ N, where N is a certain fixed 
integer. What is N? What do you conclude about the sequence {x„}? 

5. Let S be the set of all rational numbers r such that r 2 < 2. What is the least upper 
bound of S? What is the greatest lower bound of S? 


Ii 

h 

I 3 
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6. Let 5 be the point set consisting of all the points x„ = (-l) n j^2 - ^-j, n = 1, 

2, . . . . Find the least upper bound and greatest lower bound of S. 

7. The same as Exercise 6, except that x„ = (-1)" +(l/n). 

8. Let a sequence of numbers «i, a 2i . . . be such that (2 - a n ) a n + i = 1. (a) Show that 
lim*^* a n exists. Consider two cases: either a„ < 2 for all n , or a„ > 2 for some n. (b) Find 
the limit of the sequence. 

9. Let Xi and yi be given with x, > > 0, and define 

x„+i = + y " , y„+i = Vx„y„, n = 1,2,... 

Show (a) that y„ <x } , (b) that y, < x n+1 <x„, (c) thatO <x„+i - y n+l < (x, - yi)/2 n . 

Explain why the sequences {x„} and {y„} are convergent and have the same limit. 

10. The number V3 is defined as the positive number c such that c 2 = 3. That there 
is such a number may be proved as follows: Let S be the set of all positive numbers x 
such that x 2 < 3. This set is not empty, since 1 E S. It is a bounded set, for, if x E S, either 
x ^ 1 or x > 1 , and in the latter case x < x 2 < 3. By Thoerem II, S has a least upper bound, 
which we shall denote by c. It remains to prove that c 2 = 3. (Why must c be positive?) If 
e>0, there is some x in S such that c-e<x. Hence (c — e) 2 <3, or c 2 — 2cc + € 2 <3, 
since x 2 <3. Now let e->0, and by Theorem XI, §1.61, we conclude that c 2 ^3. Finally, 
we show that c 2 <3 is impossible. For, if c 2 <3, the fact that lim h -»o (c 2 + 2cft + h 2 ) = c 2 
shows that (c + h) 2 < 3 if h is sufficiently small. This means that c + h is in S if h is any 
sufficiently small positive number. Since c is the least upper bound of S, this is a 
contradiction. Therefore c 2 - 3. 

Now let the student develop a similar argument to show that if A>0 and n is a 
positive integer, there is a positive number c such that c n = A. 
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3 / CONTINUITY 

Tn §1.12 and §1.2 we pointed out the need to know that if a function is 
continuous at all points of a finite closed interval it actually attains an absolute 
maximum and an absolute minimum at points of the interval. Again, in §1.51, we 
saw that another property of continuous functions occupies a key position in the 
proof of the mean-value theorem for integrals. After our study of the real 
number system in Chapter 2 we are prepared to prove that continuous functions 
do in fact possess the properties referred to above. 

The definition of a continuous function was given in §1.1. We repeat the 
definition. 


Definition . Let fbea function which is defined in some interval containing the point 
x 0 either inside or at one end. We say that f is continuous at x 0 provided that 
lim x ^ Xo f(x ) = /(x 0 ). If *o is at one end of the interval, x must approach x 0 from one 
side only. We say that f is continuous on an interval if it is continuous at each point 
of the interval. 


Often it is convenient to express the definition of continuity in an alternative 
but equivalent way, using inequalities: f is continuous at x 0 provided that to each 
positive number e corresponds some positive number 8 such that |/(x)-/(x 0 )| < 
e whenever \x - x 0 | < 8 and x is in the interval on which f is defined. Observe 
that |/(x)-/(x 0 )| < e is equivalent to the double inequality /(x 0 )- e <f(x) < 
/(x 0 ) + e. The choice of the number 8 will as a rule depend both on e and on x 0 
(and of course on the particular function /). 

Among the important theorems about continuity is the following assertion 
about the continuity of sums, products, and quotients: 


THEOREM I. Let f and g be functions defined on the same interval . If f(x) and 
g(x) are continuous at a point x = x 0 , so are /(x) + g(x) and f(x) • g(x). If 
fix) 


g(x 0 ) t* 0, the quotient 


g(x) 


is also continuous at x = x 0 . 


The proof stems directly from the fundamental limit theorem (Theorem XIV, 
§1.64). We have, for example, 


lim 

X^Xq 


f(X> /(Xp) 

#(*) lim g(x) g(xoV 

X -+x Q 
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provided g(x 0 ) A 0. This is precisely the statement that the quotient is 
continuous at x = x 0 . The other parts of the theorem are proved in the same way. 


EXERCISES 

1. Assuming it is known that the functions sin x, cos x are continuous for all values 
of x, discuss the continuity of tan x and ctn jc. Is either of these functions discontinuous at 
a point where it is defined? 

2. If / is defined and continuous on a ^ x ^ b, is l//(x) defined on the same interval? 
What about continuity of 1 //(x)? May it be discontinuous at a point where it is defined? 

3. May a product /(x) • g(x) be continuous at a point x 0 where g is discontinuous? 
Support your answer. May the product be continuous at a point where both / and g are 
discontinuous? 

4. May a sum /(x) + g(x) be continuous at a point where / is continuous and g 
discontinuous? May the sum be continuous at a point where both / and g are dis- 
continuous? 


f( x ) 

5. Suppose f(x), g(x), and L y^r are defined in some open interval containing x 0 , and 


/(*) 


g(*) 


suppose ^ and g(x) are continuous at x 0 . May / be discontinuous at x 0 ? 


3.1 / BOUNDED FUNCTIONS 

A set of real numbers is said to be bounded if the set has both an upper and a 
lower bound. Consider a function /, defined on a given interval. We say that the 
function is bounded on the interval if the set of all the values of the function is a 
bounded set. This means that there is some number A such that |/(x)| ^ A for all 
x on the interval; or, alternatively, there are two numbers m, M such that 
m ^/(x)^M for all x on the interval. We make this definition whether or not 
the interval on which the function is defined is closed. 

Example 1. The function /(x) = sin x is bounded on the interval 0^ x ^ 2 tt, 
because — l^sinx^l. Actually, the function is bounded on every interval in 
this case. 

Example 2 . The function /(x) = 1/x is not bounded on the interval 0 < x ^ 1, 
for there is no upper bound to its values. We can make /(x) as large as we please 
by taking x sufficiently near zero. 

Example 3. The function /(x) = x sin x is not bounded on the infinite interval 
0 ^ x. There is neither an upper nor a lower bound to the values of the function, 
since f(nirl2) = mrl2 if n - 1, 5, 9, 13, . . . and f(mrl2 ) = -nir/2 if n = 3,7, 11, 
15,.... 

THEOREM II. If a function is continuous on a finite closed interval , it is 

bounded on that interval. 
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First Proof. Let the function / be continuous on the interval a^x^b. We 
observe in the first place that if x 0 is any point of the interval [a, b] there is some 
subinterval containing x 0 in which / is bounded. For, if we take e = 1 in the 
criterion for continuity as stated in §3, and denote the corresponding 8 by Si, we 
have f(j c 0 )- 1 <f(x) <f(x 0 )+ 1 provided that x is a point of the interval [a, b] 
such that x 0 - Si < x < jc 0 + Si. Note that this subinterval extends on both sides of 
x 0 if a < x 0 < b, and on one side of x 0 if x 0 = a or x 0 = b. To show the dependence 
of Si on x 0 we shall write 5, = S ( (x 0 ). A second observation of importance is this: 
If / is bounded on each of two abutting or overlapping subintervals, then it is 
bounded on the single interval which consists of all points which belong to either 
one or both of the original subintervals. For example, if |/(x)| ^ 10 when 
O^x ^ 3/2 and |/(x)| ^ 15 when 1 ^ x ^ 3, then certainly |/(x)| ^ 15 when 0 x ^ 
3. 

Now let us define T as the set of all numbers t for 
which a <t ^ b and such that f is bounded on the 
interval a ^ x ^ t. The set T is not empty, for by the 
previous paragraph it certainly contains t if a < t 
< a + 8,(a) (see Fig. 21). Furthermore, T has the upper 
bound b. Therefore, by Theorem II, §2.7, T has a least 
upper bound, say c. To complete the proof we shall 
prove two things: (1) that c belongs to T, so that / is 
bounded on [a, c], and (2) that c = b. By defini- 
tion of c we know that either c belongs to T, or there 
are points of T as near c as we please on the left 
of c. Now, as we observed at the outset, / is 
bounded in some subinterval containing c and extending at least a distance 5i(c) 
to the left of c. There will certainly be a point of T, say t h such that 
c - 8,(c) < t\ ^ c. If t\ = c, (1) is clear; if t x < c, f is bounded on each of the 
overlapping intervals [a, t,], [c - 5i(c), c], and hence on the single interval [a, c]. 
In either case we see that c belongs to T. This proves (1). 

To prove (2) we assume c < b and deduce a contradiction. Once more we 
use the fact that f is bounded in some subinterval containing c ; since c < b this 
subinterval will extend somewhat to the right of c, say as far as c + Si(c). But 
now we know that / is bounded on each of the abutting intervals [a, c], 
[c,c + 5i(c>], and hence on the single interval [a, c + 8i(c)]. This means, 
however, that c + S,(c) is in the set T, by definition. Here we have a contradic- 
tion, since c + S t (c) is greater than the least upper bound of T. We have thus 
proved (2), and thereby completed the proof of the theorem. 

Second Proof. This proof uses the theorem of nested intervals (Theorem VI, 
§2.8). As in the first proof, we observe that if a function is bounded on each of 
two abutting closed intervals, it is bounded on the interval which is obtained by 
combining the two intervals into one. Now suppose the theorem to be proved 
were false; that is, suppose f is not bounded on [a, b]. Denote the interval [a, b ] 
by 1 1 . Consider the closed intervals [a, (a + h)/ 2], [(a + h)/2, b] obtained by 


i/(a)+l 
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bisecting Jj. On at least one of these closed subintervals (denote such a one by 
I 2 ) f must fail to be bounded. We proceed to bisect I 2 , obtaining a new 
subinterval J 3 on which / fails to be bounded. By repetition of this process we 
generate a sequence I n of closed intervals on each of which / is not bounded. 
The length of I n is ( b - a)/2 n ~\ Hence it is clear that I n is a nest, as defined in 
§2.8. By Theorem VI of §2.8 there is a single point, say x = c, which is in each of 
the intervals J n , and hence in the interval [a, b]. Now, as shown at the beginning 
of our first proof, f is bounded on some interval containing the point c. Denote 
such an interval by J. Since the length of I n tends to zero as n increases, and 
since c is in I„, it is clear that J must contain I n when n is sufficiently large. But 
this involves a contradiction, for / is not bounded on I n and it is bounded on J. 
Because of this contradiction, our initial assumption that the theorem is false 
must be rejected. We have thus completed the proof. 

EXERCISES 

1. Let /(x) = 2x sin(l/x)-cos(l/x). Is this function bounded on the interval 0<x^ 

1 ? 

2. Consider the function tan -1 x, defined for all values of x. Is it bounded? 

3. Which of the following functions are bounded on the indicated intervals? 

(a) 7~“> — 1 < x < 1 ; (c) \ . ,0Sjeg2u; 

1 + x V5-2sinx 

(b) ^,0<x<l; (d) 0<x^y; (e) ^ sin 0 < x ^ 1. 

A A Z A A 

4. Without attempting to find exact absolute maxima, find numbers M such that 
|/(x)| on the intervals indicated in each of the following cases: 

(a) /(x) = x 17 — 6x 13 + 5x 2 — 2, -l^x^ 1; 

(b) /(x) = 3 sin 2 x - 2 cos x - sin ~ cos j, 0 ^ x ^ 2ir ; 

x 3 — x 2 + 1 

(C) /(x)= 1 + JC 4 ,-lgiS2. 

3.2 / THE ATTAINMENT OF EXTREME VALUES 

Suppose we are given a function /, and suppose we know that the function is 
bounded on a certain given interval. Let m and M be the greatest lower bound 
and least upper bound, respectively, of the values of f(x) 
on the given interval. Is it necessarily the case that f(x) 
actually takes on the values m and M on the interval? 

Examples show that the answer to this question is negative. 

Example 1. Suppose we define /(x) = x 2 if 0^x<l, 
f(x) = 0 if x = 1 (see Fig. 22). This function has M = 1 for 
the interval 0 ^ x ^ 1, but there is no x on the interval such 
that /(x)=l. Note, however, that the function is not 
continuous at x = 1. 
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Example 2. Let f(x ) = - sin 0 < x. This func- 

tion is continuous for all positive values of jc. We have 


1 

x + 1 



~ 1+X 


< 1 


and hence -1 </(x) < 1 when x > 0. Actually m = - 1 
and M - 1 for this function on any interval 0 < x ^ a, 
where a > 0. As x approaches 0, f(x) oscillates in value 
between 1 l(x + 1) and -1 /(x + 1) an infinite number of 
times (see Fig. 23). If N is any number such that 
-1 < N < 1, there are infinitely many values of x as 
near 0 as we please such that /(x) = N. But the values 
±1 are never attained. 



Fig. 23. 


There is, however, a companion to Theorem II of the preceeding section, in 
which it was assumed that / is continuous on an interval that is both finite and 
closed. In that case the answer to the question raised in the opening paragraph of the 
present section is positive. 


Let f be continuous when a ^ x ^ b, and let m and M be the 
greatest lower bound and least upper bound of the values f(x) on this 
interval. Then f(x) assumes each of the values m and M at least once in the 
interval. 


Suppose the value M is never attained, so that M — /(x)>0 for all x 
in the interval. Consider the function 

g(x) = M -/(x) - 

It is a continuous function, since the denominator never vanishes (see Theorem 
I, §3). Hence, by Theorem II, §3.1, g(x) is bounded. Let A be an upper bound of 
g(x) (A is necessarily positive): 

> 

This inequality may be transformed successively into 

But we now have the result that M — (1/A) is an upper bound for f(x). This is a 
contradiction, however, for M — (1/A) < M, and M is the least upper bound of 
f(x). We must then conclude that M-f(x) vanishes for some value of x. This 
completes the proof so far as M is concerned. A similar proof can be given that 
the value m is assumed. Alternatively, one can consider the function -/(x), 
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whose least upper bound is — m. Then, by what has already been proved, 
-f(x) = -m must hold for some x. This is equivalent to / (x) = m, of course. 

EXERCISES 

1. Let f(x ) - 3x if 0 < x < 1, and define /( 0) = 1, /(l) = 2. Find m and M for / on the 
interval [0, 1]. Are these extreme values attained? What can you say about the continuity 
off? 

2. Let /(*)- x-[x], where [x] denotes the greatest integer less than or equal to jc. 
Find m and M for / on [0, 2]. Are these extreme values attained? Graph the function. 

3. Find m and M for /(x) = tarT 1 x, where jc can range over all values. Are these 
extreme values attained? 

4. Let f(x) = — y* 4 + ~ * 4 sin — , 0 < i j 1. 

2 2 jc 

Find m and M for this function on the indicated interval. Are these extreme values 
attained? 

5. Given a point Q and a circle in the xy-plane, with Q not on the circle, explain with 
the aid of Theorem III how you know that there is a point Pi on the circle which is at 
least as near to Q as any other point of the circle is; also explain why there is a point P 2 
on the circle which is at least as far from Q as any other point on the circle is. 

6. Given a point Q and a parabola in the xy-plane, with Q not on the parabola, 
explain why there is a point P on the parabola at least as close to Q as any other point of 
the parabola is? Why, in contrast to the case of the circle in Exercise 5, is there no point 
of the parabola at maximum distance from Q? If you wish, you may choose a co-ordinate 
system in which the parabola has a very simple equation. 


3.3 / THE INTERMEDIATE-VALUE THEOREM 

THEOREM IV. Suppose that f is continuous on the closed interval a^x^b, 
and that f(a) j* /(b). Then , as x varies from a to b, /(x) takes on every value 
between f (a) and f(b). 

This theorem expresses a property of continuous 
functions which has a simple geometric interpretation. 

Suppose, for example, that /(a)</(b), and that k is a 
number between /(a) and /(b). Consider the graph of 
y = /(*)• It is a continuous curve joining the points (a, /(a)) 
and (b, /(b)). These points are on opposite sides of the line 
y = k ; the theorem asserts that the curve y = /(jc) intersects Fig. 24. 
the line y = k at some point x = c between a and b (see Fig. 

24). 

With this geometric interpretation, the student may be strongly tempted to 
say that the theorem is obviously true and requires no further proof. If the 
student does so, however, he or she is not relying upon our definition of 
continuity given in §3, but upon intuitive assumptions about the geometrical 
meaning of the term “continuous curve.” We wish to show that the theorem can 
be proved on the basis of the definition. 
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Before proving Theorem IV we shall find it convenient to prove the 
following proposition: 

THEOREM V. Let / be defined on an interval containing the point c, and let f be 

continuous at x = c. Then , if f(c) ¥ 0, there is a subinterval containing c 

throughout which f(x) has the same sign as f(c). 

Suppose, for example, that f(x) is defined when 
c-h<x<c + h, where h > 0, and that f(c) > 0. Then if ■ 

/ is continuous at x = c the theorem asserts that we can 
choose 8, 0 <L8 < h, so that /(x) > 0 when |x — c| < 8 (see 

Fig. 25). The proof is an immediate consequence of 

Theorem X, §1.61, since lim*_> c /(x) = /(c) > 0. The proof 0 
when /(c) < 0 is left to the student. If c is at one end of 
the interval on which / is defined, the subinterval on Fig. 25. 
which / is of constant sign will extend on one side only of 
c. 

Proof of Theorem IV. Consider the function g(x) = /(x) — k, where k is 
between f(a) and f(b). Let us assume for definiteness that f(a) < k <f(b). We 
shall assume the theorem false and deduce a contradiction, thus completing the 
proof. Thus we assume that g(x) is never zero on the interval [a, b]. 

Now g(a) <0 and g(b) > 0. Bisect the interval [a, b]. At the midpoint g(x) is 
not zero, and hence is either positive or negative. Choose that half interval for 
which g(x) is positive at one end and negative at the other, denoting it by I 2 . The 
interval [a, b] we denote by I). We now repeat the bisection process, suc- 
cessively obtaining intervals Jj, I 2 , 1 3 , . . . such that g(x) is negative at the left end 
of I n and positive at the right end. These intervals obviously form a nest, and so 
by Theorem VI, §2.8, close down on a unique point c. Now g(c) ^ 0, and so by 
Theorem V g(x) is of one sign throughout some interval J containing c. But such 
an interval will contain I n when n is sufficiently large, because the length of I n 
approaches zero as n increases. Thus g(x) takes on both positive and negative 
values in J. We have now reached a contradiction, and the proof is complete. 

EXERCISES 

1. Suppose / is defined as /(x) = |x| log |x| if x^ 0, /( 0) = e. Without investigating the 
limit of /(x) as x -»0, explain why it is certain that / is not continuous at x = 0. 

2. A point Q is inside a circle C. The point of C nearest Q is a distance d from Q, 
and the point of C furthest from Q is a distance D from Q. Explain how you know that 
there is a point P on C whose distance from Q is \(d + D). 

3. If / is continuous and /(x)^0 on [a, b], while /(x)>0 for at least one x of the 
interval [a, b], prove that f b a f(x)dx >0. Give a carefully reasoned proof, not merely an 
intuitive argument based on geometric plausibility. 

4. Let P be a point on an ellipse. Consider rays (half-lines) emanating from P. 
Explain how you know that there is one such ray which divides the area enclosed by the 
ellipse into two equal parts. 
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5. Suppose / is continuous on [a, b ] and let A be a number such that 0 < A < 1. Show 
that there is an x such that a<x<b and ff /(f) dt = A/«/(t) dt, provided that the 
condition /«/(f) dt^ 0 is fulfilled. 


MISCELLANEOUS EXERCISES 

1. Six functions are defined below, each by a certain formula. For each function 
answer the questions: (a) For what values of x is the function defined? (b) For what 

values of x is the function continuous? (c) Are there any values of x where the function is 
not already defined, but may be defined so as to make the function continuous? 

® = ? T7TT = 


(ii) *(x) 

(v) G( x) = 

(vi) H( x) = 


x — \ 

JC 2 - 1 


(iv) F(x) = 


x 2 — 16 


jc + 6— 5Vx 


x 2 — 5 jc - 50 


(x 2 — 8x — 20)Vx 2 - 25 
3x + 5 


V2x - 3 - V5x - 6 + V3x - 5 


2. Given that / is defined for all x, continuous at x = 0, and that, for all x and 
y, /(x + y) = /(*)/(y), show that / is continuous for all values of x. 

3. Given that / is defined for all x, continuous at x = 0, and that, for all x and y, 
f(x + y) - /(x) + f(y), show that / is continuous for all values of x. 

4. A function / is defined and continuous on the interval a^x^b, and /(x) = 0 
when x is rational. Using Theorem V, explain why /(x) = 0 at all points of the interval. 

5. Show that |/(x) — /(x 0 )| < |x — x 0 | if /(x) = V4 + x 2 and x#x 0 . What does this 
prove about /? 


6. Let P(x) be a polynomial of odd degree with real coefficients. Then the equation 
P(x) = 0 has at least one real root. Prove this by use of Theorem IV. 

7. Let P(x) = x n + mx" -1 + *** + a n , where n is an even positive integer, the a’s are 
real, and a n <0. Show that the equation P(x) = 0 has at least two real roots. What more 
can you say about them? 


8. Explain why 


x 2 + 1 x 4 + 1 

x+2 x-3 


0 has at least one root between -2 and 3. 


9. Let fix) — — 2 Ft-t**; 1--- j- ; 1, where A, B, and C are positive and a > 

J ' a +x b +x c +x 

b>c> 0. Discuss the nature of the graph of y=/(x) and explain why the equation 
/(x) = 0 has exactly three roots xi, x 2 , x 3 satisfying the inequalities -a 2 < Xi < -b 2 <x 2 < 
-c 2 <x 3 . 

10. Suppose that / is continuous for all values of x, that lim*^-o°/(x) = -1, and 
lim*^+°°/(jc) = 10. Explain how you use Theorem IV to show that there is at least one 
value of x such that f(x ) = 0. 

11. Suppose that / is given continuous for all x, with /(x)<0 when x<xj and 

fix) >0 when x>x 2 , where Xi<x 2 . (a) Define a cut (see §2.4) in such a way that 

the cut number c satisfies the conditions /(c) = 0, f(x) < 0 if x < c. (b) Define a cut in such a 
way that the cut number satisfies the conditions /(c) = 0, /(x) > 0 if x > c. 
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12. Prove Theorem III by using a repeated bisection method, as in the second proof 
of Theorem II. At each bisection, retain a half interval on which the least upper bound of 
/ is M. Let {!„} be the resulting nest of intervals. Choose a point x n in I„ such that 
/(*») >M — (1/n). Why is this possible? What happens as n-»oo? Write out the whole 
argument carefully, showing how you are led to a point c such that /(c) = M. 

13. Consider the following theorem: If f is continuous when a^x^b and if 
f(a)f(b)< 0 ( i.e if f(a) and f(b ) are of unlike signs), then there is some point c between a 
and b at which f(c) = 0. As we saw in §3.3, the proof of Theorem IV may be made very 
easily once this theorem is proved. Give a proof, using the existence of a least upper 
bound, as guaranteed in Theorem II, §2.7. Start by defining S to be the set of all x such 
that a ^ x < b and f(x)f(a) > 0. Why does S have an upper bound? Let c be this upper 
bound and prove that f(c) = 0. Write out your whole argument clearly, with specific 
justification for each step in the reasoning. 

14. A function / is defined and continuous when OgxS 1, and /( 1) = 2. The function 
has the further property that the value f(x) is always a rational number. Find /(0). 

15. A function / is defined when O^x S=2, with at most one point of discontinuity. 
Furthermore, the value f(x) is rational if 0 ^ x < 1 and irrational if 1 < x ^ 2. Why must / 
have exactly one point of discontinuity? What is that point? 

16. Let / be defined on [0, 1] as follows: /(x) = x if x is rational, /(x) = 1 - x if x is 
irrational, (a) For what values of x is / continuous? (b) In spite of the fact that / does not 
satisfy the hypothesis of Theorem IV on [0, 1], show that as x varies from 0 to 1, /(x) 
takes on every value between /( 0) and /( 1). What is x, for example, if f(x) = V2/2? 

17. Let / be defined for all x by the conditions: /(x) = 0 if x is irrational or if x = 0, 
/(x) = 1/n if x is the rational number min , where m # 0, n > 0, and the fraction is reduced 
to its lowest terms. Explain carefully (a) why / is discontinuous at x 0 if x 0 is rational and not 
zero, (b) why / is continuous at x 0 if x 0 is irrational, (c) Is / continuous at x = 0? 

18. Suppose that / is defined and has a continuous derivative when a ^ x ^ b, and 
suppose that /(a) = /(b) = 0, but that /(x) is not 0 for every x. Let A be a constant 
different from zero, and let g(x) = /'(x) + A/(x). Prove that there is some number £ such 
that a < g < b and g(£) = 0. This is fairly hard. There are two cases: Case 1, in which /(x) 
assumes both positive and negative values, and Case 2, in which /(x) takes on values of 
one sign only. The proof in Case 1 is easy. For Case 2 it is fairly easy to prove that 
g(£) = 0 for some £ such that a ^ £ = b, but it is much more difficult to prove the result as 
originally stated, with a <£<b. 

19. A function / is continuous on a ^ x ^ b and differentiable on a < x < b. Further- 
more, /(a) = a and f{b) = b. Show that there are two points Xi, xi such that a < Xi < x 2 < 

b and 777-T + v J t = 2. 

/(*i) /(* 2) 

20. Suppose that / is defined and differentiable on the closed interval [a, b], with 
f'(a) > 0, f'(b ) < 0. Prove that /'(£) = 0 for some £ such that a <£<b. Show also that the 
same conclusion may be reached if /'(u)<0, f'(b) >0. 

21. There is a counterpart of Theorem IV for functions which need not be con- 
tinuous, but are known to be derivatives of continuous functions. The theorem reads as 
follows: Suppose that f is defined and differentiable on [ a , b], with f’(a) 7 * f'(b). Let k be a 
number between f(a) and f(b). Then /'(£)= k at some point £ where a<£<b. This 
theorem is known as Darboux’s theorem. Prove it by setting g(x) = /(x)-kx and using 
the results of Exercise 20. 
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22. Let / be a function which is defined for all x, continuous at x = 0, and such that 
f(x + y) = /(x ) + f(y ) for all values of x and y. Show that f(x) = Cx , where C = /(l). Begin 
by proving (a) that /(m/n) = (m/n)/(l) if m and n are positive integers, (b) that f(-x) = 
-f(x), and (c) that /(0) = 0. Then note Exercise 3 and apply Exercise 4 to the function 
/00-jc/U). 



4 / EXTENSIONS OF THE 
LAW OF THE MEAN 


4 / INTRODUCTION 

In this chapter we consider various generalizations of the law of the mean 
(Theorem IV, §1.2), and related topics. The most important extension of the law 
of the mean is Taylor’s formula with remainder. In elementary calculus Taylor’s 
formula is often closely associated with the study of expansions of functions in 
power series. There is indeed an important connection between Taylor’s formula 
and expansions in power series, but the formula is not important solely because 
of that connection. The chapter closes with a discussion of l’Hospital’s rule. 


4.1 / CAUCHY'S GENERALIZED LAW OF THE MEAN 

Cauchy’s generalization of the law of the mean deals with two functions instead 
of with just one. 

THEOREM I. Let F(x) and G(x) be continuous on the closed interval a^x^kb, 
and differentiable on the open interval a <x < b. Assume further that 
G{a) 7 ^ G(b ), and that F'(x) and G r (x) never vanish simultaneously . Then for 
some value x = X such that a <X < b we have 

F(b) — F(a) __ F'(X) 

G(b) - G(a) G'(X) K ’ 

As a special case we obtain the ordinary law of the mean (Theorem IV, §1.2) 
if we take G(jc) = x. 


Proof. As with the proof of the ordinary law of the mean, we appeal to 
Rolle’s theorem (§1.2). Let us set 


F(b)-F(a) 

G(b)-G(ay 


(4.1-2) 


and define 

4>(x) = F(x) - F(a) - k[G(x) - G(a)l 

Observe that </>(a) = 0. Also, because of the definition of k, we see that (f>(b) = 0. 
Now 

<l)'(x) = F'(x)-kG'(x). 


The function <£(x). satisfies the conditions of Rolle’s theorem; therefore = 
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0 for some X, a<X <b. That is, 

0 = F'(X)-kG'(X). 

In this formula G'(X) ^ 0. For if G'(X) = 0 then F'(X) = 0 also, whereas we 
assumed that these derivatives never vanish simultaneously. Thus we can write 

k= m> 

G'(xy 

a formula which is equivalent to (4.1-1) because of (4.1-2). This completes the 
proof. 


A geometrical interpretation of (4.1-1) may be given as follows: Let a plane 
curve be represented parametrically by equations 


y = F(t), x = G(0, a^t^b. 


(4.1-3) 


The slope of the curve for a given t is 

dy__F'(t) 
dx G\ty 


(4.M) 


The constant k in (4. 1-2) is the slope of the straight line 
joining the points on the curve corresponding to t - a 
and t = b, respectively. The theorem says that the two 
slopes (4.1-2) and (4.1-4) are equal for at least one value 
of t between a and b (see Fig. 26). 

The uses of Theorem I are largely in proving other 
theorems, notably Theorem III (§4.3) and Theorem VI 
(§4.5). 



EXERCISE 

Cauchy’s generalized mean-value formula (4.1-1) may be written in the form 


F(a) G(a) 1 
F(b) G(b) 1 
F'(X) G'(X) 0 


= 0 . 


This suggests the following more general theorem: 

Suppose that F, G, H are continuous when a ^ x ^ b and differentiable when 
a <x < b. Then there is a value X such that a <X < b and 


F{a) G(a) 
F(b) G(b) 
F'(X) G'(X) 


H(a ) 
H(b) 
H'(X) 


= 0 . 


Prove this theorem by considering 

F(a) G(a) H(a) 
F(b) G(b) H(b). 
F(x) G(x) H(x) 


(f>(x) = 



4.2 


TAYLOR S FORMULA WITH INTEGRAL REMAINDER 


97 


4.2 / TAYLOR’S FORMULA WITH INTEGRAL REMAINDER 


Consider a polynomial P(x) of degree n: 

P(x) = b 0 x n + b { x n + • • * + b n9 n 0, (4.2-1) 


where b 0 ^ 0. If we choose any particular value of x, say x = a, it is possible to 
express P(x) as a sum of powers of (x - a), the highest power being n: 

P(x) = cq(x - a) n + c,(x - a) n ~ l + • * • + c n . (4.2-2) 

That this is the case may be seen as follows: It is clearly true if n = 0, for in that 
case the two expressions for P(x) are identical in form. For n = 1 we have a 
linear function P(x) = b 0 x + b l9 and we wish to express it in the form c 0 (x - a) + 
c |. Choosing c 0 = b 0 , we have 

P(x)- b 0 (x - a) = bi + ab 0 , 

so that P(x) = b 0 (x - a)+ c, with c 1 = bi + ab 0 . In general, we proceed by in- 
duction, assuming that the desired type of representation is possible with 
polynomials of degree — 1, where n ^ 1. Then for the polynomial (4.2-1) we 
choose c 0 = b 0 , so that P(x) - b 0 (x - a) n is a polynomial of degree at most n — 1. 
Hence we can express this polynomial as a sum of powers of (x - a): 

P(x)-bo(x - a) n = c,(x - a) n ~ x + • • • + c n . 

This is equivalent to (4.2-2), and completes the induction proof. 

Once we know that the representation (4.2-2) is possible, it is very easy to 
find convenient formulas for c 0 , . . . , c„. Let us differentiate (4.2-2) k times, 
where O^k^n [if k - 0, P (fc) (jc) means P(jc)]. After doing this we set x == a. In 
this process the only term of P(x) which leads to a non-zero result is 
c«-k(* n) k . Therefore 

P (k> (a) = { [c„ k (x - a)*]} = k 


where, according to the usual convention, 0! = 1. 
Consequently 


P (n \a) _P {n ' ,) (a) 
n\ ,C ' (n- 1)!’" 


Cn = P(a), 


so that (4.2-2) may be written in the form 

P(x) = P(a) + P'(a)(x-a) + ^^(x-a) 2 +- ■ ■+ P --j l) -<x - a)". 


where we have reversed the order of the terms to suit our convenience. 

Now let us ask whether there is any counterpart of this formula when the 
polynomial P(x ) is replaced by a function /(x) which is not a polynomial. That 
is, let us ask what relation the expression 

f(a) + f'(a)(x — a) + ^-~^-(x — a) 2 + ■ ■ ■ + - Jp (x - a)" 
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bears to /(x). Naturally we assume that / can be differentiated n times at x = a. 
Let us assume more than that, however. A convenient assumption is that / and 
its first n + 1 derivatives are continuous in a closed interval containing x = a. 
Now by Theorem VIII, §1.53, we know that 

[ X fV)dt=f(x)-f(a). 

J a 

Let us write this in the form 

f(x) = f(a)+ [‘mdt. (4.2-4) 

Ja 

We transform the integral by integration by parts, taking 

u = /'(0, dv = dt , 
du = /"(0 dt, v — (x t). 

Thus 

r no dt = — mix - o r + f mix - 1) dt, 

Ja \ a Ja 

and so , x 

/(x) = /(a) + f(a)(x - a) + f'(f)(x - t) dt. (4.2-5) 

Ja 

We can now integrate by parts again, taking 

u = HO, dv = (x — t) dt, 

dw = / ,3) (t) dt, v = — 

The integral in (4.2-5) now becomes 

[ f'(t)(x - t)dt = - /"(f) | V £ / <3) (t) p/p dt, 

and so 

f(x) = f(a) + f (a)(x - a) + /"(a) ^pp^ + ± f* / ,3) (t)(x - t) 2 df. 

It is now clear that repeated integration by parts will lead to the formula 
/ (x) = /(a) + /'(a)(x -«)+••• 

+ (x - a)" f * (x - t)7 (B+,, (t) dt. (4.2-6) 

This is the generalization of (4.2-3) which we have been seeking. The function 
f(x) has been expressed as a polynomial of degree n in (x — a), plus a remainder 
term. 

We state our findings in a formal theorem. 
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THEOREM II. Let f(x) and its first n + 1 derivatives ( n ^ 0) be continuous in a 
closed interval containing x = a ( either inside or at one end). Let x be any 
point of this interval . Then 

f (n) (a) 

f(x) = f(a) + f'(a)(x - a) + • * • + (x - a) n + R n+h (4.2-7) 
the remainder being given by 


Rn+i ~ dt. 

n : J a 


(4.2-8) 


We have indicated the procedure for proving the theorem by successive 
integration by parts, starting from (4.2-4). If one wishes, he may give the proof 
more formally by mathematical induction. 

Formula (4.2-7) is called Taylor's formula with remainder. Various formulas 
for the remainder may be given, as we shall see in §4.3. 

The size of the remainder may sometimes be estimated from (4.2-8). Thus, 
for example, if |/ (n+,) (t)| = M when a ^ t ^ x, we can see that 


|R„ +1 |£^JV -()"<» = 


M(x- a) n+1 
(n + 1)! ' 


4.3 / OTHER FORMS OF THE REMAINDER 

It is possible to obtain (4.2-7) with a different formula for R n+ 1, under slightly 
less stringent assumptions. It will suffice to assume merely that / (n+,) (x) exists on 
an interval, without necessarily being continuous. 

THEOREM III. Let f and its first n derivatives be continuous when a^x^b 
(where a < b ), and let the (n + l)sf derivative f in+l \x) exist when a <x < b. 
Then there is a value x = X, a <X < b, such that 

m = /(a) + f'(a)(b - a) + • • • + ^ ( b - a)" + ?*'}$ (b ~ a) " + '- 

n. (4.3-1) 

The same formula holds in case b < a, all the inequalities then being reversed. 

This is a generalization of the law of the mean, and actually coincides with 
the law of the mean in the special case n = 0. 

It is difficult to give a proof of (4.3-1) which will seem well motivated and 
free from artifice. The following proof has been discovered as a result of careful 
study and a certain amount of trial and error. 

We define two functions 

F(x) = m - f(x) - f(x)(b -X) ^ ( b - xy, (4.3-2) 


(b-xr + 

(n +iy. • 


and 


(4.3-3) 
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Observe that F(b ) = G(b) = 0. In calculating the derivative of F(x) we find that 
a great deal of cancellation occurs between terms arising from the differentiation 
of the right member of (4.3-2). Thus 

^ [- f'(x)(b -x)] = - f"(x)(b -*) + /'(. x). 


The f'(x) here cancels the derivative of the previous term, -/(x); the term 
—f”(x)(b — x) is canceled by one of the terms coming from the differentiation of 

- 2\ ’ (b ~ x) 2 - The final result, which the student should verify, is 

F n+l >(x) 

F ,(x ) = ~ - *)"• (4.3-4) 

We also have 

G’(x) = - (b ~ ) " . (4.3-5) 


Let us now apply Cauchy’s form of the law of the mean (4.1-1). Since F(b) and 
G(b) are zero, it reads 


F(a) F\X) 
G(a) G'(Xy 


or 


F(a) = 


F'(X) 

G\X) 


G(a). 


Taking account of (4.3-3), (4.3-4), and (4.3-5), this may be written 


F{a) = 


f (n+l) (X) 


(b — a) n+1 
(n + 1)! ' 


(4.3-6) 


If we now put x = a in (4.3-2) and use (4.3-6), we obtain the desired formula 
(4.3-1). This proof is valid whether a < b or b < a, because Cauchy’s formula 
(4.1-1) is unaffected by an interchange of a and b. 

The formula in Theorem III is written in a variety of different ways by 
changes in notation. One important form commonly occurring in the literature is 
obtained by putting b = a + h, where h may be either positive or negative. The 
number X between a and a + h may then be written in the form X = a + Oh, 
where 0 is some number such that 0 < 0 < 1. Thus we have 


f(a + h) = f(a) + f’(a)h + ■ • ■+ f ^}f hn + f —^T^ h " + '- ( 4 - 3 " 7 ) 

Another form results by putting b = x in (4.3-1). In this form we may write 


f(x) = f(a) + f(a)(x -a) + - -a)" + R n+U (4.3-8) 


with 


_ / (n+1) (X) +1 

Kn+] ' + } ’ 


(4.3-9) 


where X lies between x and a. 

Formula (4.3-9) is called Lagrange's form of the remainder. 
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Brook Taylor (English) published in 1715 the form of the infinite series 
(4.3-10) which bears his name today. The formulas (4.3-8) and (4.3-9) were 
derived by J. L. Lagrange (French) in 1797. Lagrange’s proof utilized integrals. 
The first proof using the law of the mean appears to have been by Ampere in 
1806. 

The student should distinguish between (4.3-8) and Taylor’s series , which is 
the infinite-series formula 

f(x) = f(a) + f'(a)(x - a) + • • • + (x - a)" + • • • . (4.3-10) 

This expansion of a function in powers of (x - a) is valid under certain 
circumstances. Proofs of such validity in particular cases may be made with the 
aid of Taylor’s formula and one or the other of the forms of the remainder. 
Power-series developments are considered systematically in Chapter 21. 


Example 1. Write Taylor’s formula for f(x) = 1/(2+ x) with a = -l and 
n = 2, using Lagrange’s form of the remainder. 

We have /(- 1) = 1 and 

' w -<5 rb- 

/<3)(x) = (2rfj 3 ’ / <3, (-o = -6. 

Thus (4.3-8) becomes 

2^=1-(x+1) + (x + 1) 2 +R 3 

W,th R ~6 (x + 1) 3 (x + 1) 3 

3 (2 + X) 4 3! (2 + X) 4 ’ 


where X is between x and -1. If we wish to estimate R 3 we observe that 

I JC H - 1 P 

1/(2 + X) lies between 1/(2 + x) and 1, and hence \R 3 \ lies between ^ + x ) 4 arK * 

\x + 1| 3 . For example, if x = — 0.9, R 3 is negative, and in absolute value between 
0.001/(1. 1) 4 and 0.001. 


Example 2. Write Taylor’s formula for f(x) = log(l + x) with a = 0 and a 
general value of n. Estimate R n+l if 0^ x 
We have /( 0) = 0 and 

/,(x)= TTi’ m=i. 


f in \x) 


(-!)"-'(« -1)! 
(l + x) n ’ 
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The general formula for / (n) (x) may be surmised after the first few instances and 
verified by induction. Thus 

log(l + x) = X -ix 2 + w + (-1)”“' L" + R n+] , (4.3-11) 

(4 - 3 - u> 

where X is between 0 and x. Of course we must have 0< 1 + x, or -1 < x. If 
0 ^ x ^ L then 

x 1 
0 ^ — - — < - 
1 + X“2’ 

and so 

^"'^“n + l ' 

There are many other possible formulas for R n+l besides that of Lagrange. 
Probably the most important of these other forms is that due to Cauchy. It 
appears in formula (4.3-13), which follows. 

THEOREM IV. An alternative form of the remainder in (4.3-1), with hypotheses 
as in Theorem III , is 

f(n+0r 

Rn + 1 = 1 n \ } ( b - xy(b - a), (4.3-13) 

where X is some number between a and b (in general different from the X of 
(4.3-1)). With b = a + h, X = a + Oh (0 < 6 < 1), this takes the form 

R n+ 1 = ^" + ' )( ^, + — h n+, (l - 6) n . (4.3-14) 

This last form is an alternative to the last term in (4.3-7), but the 6's in the two 
forms are usually different . 

Proof. We start with F(x) as defined by (4,3-2), but instead of using (4.3-3) 
we define G(x) = b - x. Then, following the method of proof of Theorem III, we 
have (4.3 — 4) and the formula G'(x) = - 1 in place of (4.3-5). 

Just as before we have, for some X between a and b, 

F(a)= Sw G(a) ’ 

or 

f(n + lVV3 

F(a) = 1 rcj ( & " X) B (b - a). 

This is equivalent to (4.3-13), for comparison of (4.3-2) and (4.3-1) shows that 
the remainder term is R„+i = F(a). 
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Other forms of the remainder, as well as other proofs of Theorems III and 
IV, are indicated in the exercises. 

Example 3. Find Cauchy’s form of R n +\ for /(x) = log(l + x) with a = 0. 
Using the work of Example 2, and putting h = x, a = 0 in (4.3-14), we find 

Rn +I - (-1)" (4.3-15) 

Example 4. Show that, in Taylor’s formula (4.3-11) for log(l + x), the fol- 
lowing estimates of the remainder hold: 

|«.+|| < ^ if - 1 < x < 0, (4.3-16) 

l R " + 'l<^TT if 0< x -1' (4.3-17) 

We use Cauchy’s form of the remainder to get (4.3-16), and Lagrange’s form 
to get (4.3-17). If —1 <x <0 we write (4.3-15) in the form 

R... -(-')■ 

and observe that 1 + 6x > 1 + x and 


0 < 


1-0 

i + ex 


<i. 


The inequality (4.3-16) is then seen to be correct. On the other hand, if 0 < x ^ 1, 
we observe that 0<X<x and hence 1 + X>1 in (4.3-12). The inequality 
(4.3-17) is then seen to be correct. 

We observe from the foregoing inequalities that R n+ \-*0 as if — 1< 

x <0 or if 0 < x ^ 1. Of course R n+ j = 0 if x = 0. It follows that Taylor’s formula, 
neglecting the remainder, gives a better and better approximation to log(l + x) 
when n is increased, provided that -l<x^l. Accordingly, we have the 
Taylor’s series expansion, valid if -1 <x ^ 1: 

log(l + x) = x -{x 2 + ■ ■ ■ + (—l)" -1 ~ + ' • • • (4.3-18) 


EXERCISES 

1. Arrange x 4 in powers of (x - 3). 

2. Write Taylor’s formula with Lagrange’s remainder in the case of f(x ) = ^ 2 ^ 3 
with a = 1, n = 2. 

3. Write Taylor’s formula with Lagrange’s remainder in each of the following cases: 

(a) /(x) = sin 2 x, a = 0, n = 3 ; 

(b) f(x) - tan x, a = 0, n = 3; 

(c) /(x) = <r* J , a = 0, n =3; 
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(d) /(x) = log(l + e x ), a = 0, n = 2; 

(e) /(x) = i5fi^>,a = 0, n = l. 

4. Find R 5 in (4.3-9) with /(x ) = sin x, a = tt/2. 

5. Write Taylor’s formula for /(x) = sin jc with a = 0. Show that 


Show that the same inequality holds for /( x) = cos x, a = 0. 

6. Show that 


with 

and 

7. Show that 


— 1 + X + yr+ * • * H + R n + i, 

Z! n ! 


0 < R„+, < e’ 


(» + D! 


if 0<x 


|Rn+i| < 


(« + !)! 


if x <0. 


(l-xr ,/2 = l+lx+^|x ; +- ■ •+ 1 2 3 4 V. ( 2 (2 n ) 1 ) x" + ^-> 1 . 

and write both Lagrange’s and Cauchy’s form of the remainder. Show that 


and 


I d j 1 • 3 • • • (2n + 1) | iH+i 
2-4 - • -(2M+2) 1 ' 


I^Rn + lf < 


(2n + l) x 1 


n ! 


(1-x) 


if — 1 < x <0 

572 if 0 < X < 1 . 


8 . Observe that the curves y = sin x, y = Ax intersect near x = it if A is small. Let 
/(x) = sin x - Ax and apply Taylor’s formula with a = tt, n — 2, assuming that x is near it 
and neglecting R 3 . Use this result to show that an approximate solution of sin x = Ax is 

X — 7t/( 1 + A). 

9. Proceed as in Exercise 8 to find an approximate formula for the solution of 
ctn x = Ax near x = 7t/2 (assuming that A is small). 

10. Suppose that / is twice differentiable in the interval a<x <b and that /"(x) s 0 
at each point. If a <x 0 <b and y 0 = /(x 0 ), show that in the given interval no point of the 
curve y = /(x) is below the line which is tangent to the curve at (x 0 , y 0 ). In particular, if 
/'(x 0 ) = 0, the function has a minimum value at x 0 . 

11. In the method of proof for Theorem III let the function G(x) in (4.3-3) be 
replaced by G(x) = (x - b) p . Then carry on the method and show that the expression for 
the last term in (4.3-7) is replaced by 

R„ + 1 = ' — h" + '(t - O<0<1. 

nip 

This is called Schlomilch’s form of the remainder. Lagrange’s form is the special case 
p = n + 1, and Cauchy’s form is the special case p = 1. 

12. The following suggestions provide a method of proving Taylor’s formula with 
remainder without appeal to Cauchy’s generalized law of the mean. Suppose that g is 
continuous on the closed interval [a, a + h] and differentiable on the open interval 
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(a, a + h). Assume further that g has the properties (i)g(a)=l, (ii) g(a + h) = 0, 
(iii) g'(x)^0 if a<x<a + h. Define F(x) by (4.3-2) with a+h in place of b. Define 
= F(x)-g(x)F(a). Apply Rolle’s theorem to $ and hence obtain a formula for 
R n +\ = F(a). The result is 


f(a + h) = /(a) + f'(a)h + ■■■+ h" 
If we choose 


f (n+l) (a + eh)h n ( i-ey 
n!g'(u + Oh) 


, x (a + h-x\ 

g(JC)= (— — j 


we obtain (4.3-7) (Lagrange’s form). The choice g(x) = - + ^ — — leads to Cauchy’s form 
(4.3-14). 


4.4 / AN EXTENSION OF THE 

MEAN VALUE THEOREM FOR INTEGRALS 

Theorem VI of §1.51 may be generalized somewhat as follows: 

THEOREM V. Let f(x) and p(x) be continuous functions defined on the closed 
interval a ^ x ^ b, and suppose that p(x) ^ 0 for every x of the interval. Then 
there is some number X such that a ^ X ^ b and 

f /(x)p (x) dx = /(X) f p00 dx. (4.4-1) 

J a J a 


Proof. Let m and M be the minimum and maximum values of /(x) on the 
interval [a, b]. Then 


and so 


mp (x) = f (x)p (x) ^ Mp(x) 


m [ p(x) dx ^ f f(x)p(x)dx^M f p(x) dx. 

Ja Ja Ja 


(4.4-2) 


There are two cases to consider: 

(1) [ p(x)dx> 0; (2) [ p(x) dx = 0. 

Ja Ja 

In case (1) we have 

[ f(x)p(x) dx 

<M. 

p(x)dx 

J a 

Since the quotient of the integrals lies between the extreme values attained by 
/(x) on the interval, it follows by Theorem IV of §3.3 that there must be some 
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point .x = X on the interval such that 

[ f(x)p(x) dx 

/(X) = ^p . 

p(x)dx 

J a 

This is equivalent to (4.4-1). 

In case (2) we see by (4.4-2) that the left member of (4.4-1) is zero. The right 
member is also zero, no matter how we choose X. Thus the proof is complete. 


As one application of Theorem V we shall show how Lagrange’s form of the 
remainder in Taylor’s formula (see (4.3-9)) may be deduced from the integral 
form (4.2-8) of the remainder. Suppose first that a <x. Then the function ( x - 0" 
is ^0 when a ^ t ^ x. Hence we can apply Theorem V to the integral in (4.2-8), 
taking p(t ) = (x - t) n . Thus 


J x (x - t)7 <n+1) (t) dt = f n+ '\x) j x (x - ty 

= / < " +l) (X)[ ~ ( ^~ t i r ' ]] = /<" +,> (X) (X ~“y -, 


dt 

n + 1 


where X is some number such that a ^ X ^ x. When this result is put back into 
(4.2-8) we obtain the Lagrange form (4.3-9). If x < a we write (4.2-8) in the form 

r, + i= l 4i- a-x)7'" +,) (()dt. 

” • J X 


If we take p(t) = (t-x) n in this integral, then p(t)^ 0, and we can apply 
Theorem V. We leave it for the student to carry out the details of showing that 
we are once more led to formula (4.3-9). 

It should be pointed out that in the present section we are assuming the 
continuity of / ( " +1 ) (jc), whereas in §4.3 we merely assumed the existence of 
f (n+[ \x). 


4.5 L’HOSPITAL’S RULE 


Theorem XIV of §1.64 states that 


lim 

x -*c 


f(x) 

g(x) 


lim f(x) 

X ->c 

limg(x)’ 


X ->c 


provided that lim x ^ c f(x) and lim x ^ c g(x) both exist and lim x ^ c g(x) 0. There are 
many instances in which these conditions are not fulfilled, however, e.g., 


.. e -cos* 0 

lim — , = ? 

x -»o tan x 


lim 


lim 

X— >+oo X 

lim i 

x->0 X 


logx = ? 

tlxl 


= ? 
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The important cases not covered by the theorem of §1.64, and not otherwise 
easily disposed of, are of two kinds. The first kind is that in which f(x) and g(x) 
both approach 0 as x c. The second kind is that in which \g(x)\ °o as x -> c. In 
both cases it may be that x -> c from one side only; it may also happen that x -> c 
is replaced by x + oo or x -> — oo. 

There is a useful rule for finding the limit of a quotient in the most 
commonly occurring instances of the cases just mentioned. This rule was popu- 
larized by the Marquis de l’Hospital in his textbook on calculus published in 
1696, and is generally known as l’Hospital’s rule. The exact statement of the rule 
actually amounts to the statement of two theorems, or of one theorem with two 
alternative assumptions leading to the same conclusion. Before coming to the 
formal statement of the theorem, we need to make a few preliminary remarks. 
We shall have two functions f(x ) and g(x ) to consider. Let c denote either a real 
number or one of the symbols +°o, — oo. We assume that / and g are defined on a 
portion of the x-axis which we denote by 7, and we assume that, if c is a real 
number, 7 is a finite open interval with c as one of its end-points, while if c is 
+oo or —oo, 7 is a semi-infinite open interval, extending indefinitely in the positive 
direction if c = + oo, and indefinitely in the negative direction if c = - oo. We then 
talk about limits as x -> c, it being understood that x is to range over 7, and to 
approach c from one side only. We furthermore assume that the derivatives 
/'(*), g'00 exist at each point of 7, and that g(x) and g'(x) are never equal to 
zero: 

THEOREM VI. (L’ Hospital's rule.) Suppose either that 

Case 1. f(x) -> 0 and g(x) -> 0 as x -> c, 
or that 

Case 2. |g(x)| -> oo as x-> c. 

Let A denote either a real number or one of the symbols +°o, -oo. Suppose that 

lim = A. (4.5-1) 

g (x) 


Then it is also true that 


lim *¥i=A. (4.5-2) 

gOO 

Proof. The proof of Theorem VI is easiest when 7 is a finite interval and 
Case 1 is assumed. Indications as to the procedure to be followed are given in 
Exercise 8. The situation is less simple if c is +°c or -oo, or if we assume Case 2. 
We shall give the proof in such a way that the argument does not depend on 
whether or not 7 is a finite interval, and also so that the arguments for Case 1 
and Case 2 are very much alike. 

We take x and y to be any distinct points of 7, with y between x and c. Then, 
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by Cauchy’s generalized law of the mean (Theorem I, §4.1), there is some point 
X between x and y such that 


g(x)-g(y ) g’(xy 


(4.5-3) 


Observe that X also is between x and c. We are assured that g(x) ^ g(y) by the 
ordinary law of the mean, since g'(x) is never zero, by hypothesis. 

Now consider Case 1, and rewrite (4.5-3) in the form 


/(*) fiy) 

g(x) g(x) ^ nx) 

1 g(y) gXxy 

gM 

Now suppose, for definiteness, that A is not +°° or -°o. Suppose e>0. The 
meaning of (4.5-1) is that there is some number x 0 such that, if x is between x 0 
and c, 

A — € < ^ 7 — r < A + e. (4.5-5) 

g(x) 

If x is such a number, so is X, and hence, no matter how y is chosen, it follows 
from (4.5-4) that 


foo f(y) 


A- 


_ ^ gM g(x) 

1 _g(y) 

g(x) 


< A + e. 


Now make y -^c. Then, since we are assuming Case 1, we conclude that 


A-e 


g(x) 


This holds if x is between x 0 and c, where x 0 may depend on e. But this means 
that (4.5-2) is true. 

The argument is essentially the same if A is +00 or - 00 . For example, if 
A = + qo, we take any number M, and choose x 0 so that 


M < 


fW 

g'(x) 


if x is between x 0 and c. Then we find that, for such values of x, 


M ^ 


fix) 

g(xY 


So we get (4.5-2) from (4.5-1) in this case also. The student may wish to review 
the definitions of limits involving the symbols +oo ? — 00 ; these definitions are 
found in §1.61. 
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To treat Case 2, we return to (4.5-3) and write it in the form 


/( y) M 

g(y) g(y) _ /'(X) 

] gQO g'(xy 

g(y ) 


(4.5-6) 


As in the previous argument, we may suppose the assumption (4.5-1) 
expressed in the form (4.5-5), with the consequence from (4.5-6) that 


f(y) /(x) 


A - 


, ^ g(y) g(y) 
i g(x) 
g(y) 


<A + e 


for all x between x 0 and c and all y between x and c. We shall now let y -» c, 
keeping x fixed. Since |g(y)|->°°, we may safely assume that 


1 - 


g(x) 

g(y) 


>o. 


We then see that 



g(x) 1 

g(y)J 


+ 


/(x) /(y) c Hx) 

g(y) g(y) g(y) 



g(x) l 

g(y)J 


As y -> c, the left and right members of this set of inequalities approach A — e 
and A + e, respectively. Certainly then it will be true that, for y beyond a certain 
point, 

A-2e<^r<A + 2e. 

g(y) 


This, however, is equivalent to saying that 


lim 

y->c 


fill 

g(y) 


= A, 


which is what we set out to prove. As in Case 1, the argument is not very different 
if A is +oo or 


Example 1. Take /(x) = e x — cos x and g(x) = tanx with c = 0. The con- 
ditions of Theorem VI, Case 1, are fulfilled with x approaching zero from either 
side. Thus 


lim 

x-0 


e x — cos x 
tan x 


= lim 

x-»0 


e x + sin x 
sec 2 x 


1 + 0 

1 


= 1. 


If, in attempting to apply Theorem VI, it should turn out that we have 
lim^ c /'(*) = 0 and lim*^ g'(x) = 0, or that lim^ c |g'(x)| oo, it may be that the 
rule can be applied a second time. 


Example 2 . Find lim 

x->0 


e x + e x — 2 
3x 2 
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Here one application of Theorem VI gives 


lim 

JE-*0 


e* + e~ x -2 
3x 2 


= lim 

x^O 


e — e 
6x 


A second application gives 


lim 

Jt-»0 


e — e 
6x 


= lim 

x^O 


e x + e 


__ 2 _ i 

— 6 - 3 


Hence the original limit is equal to 

The effect of Theorem VI is to replace the original problem of finding 

fOt) f'(x) 

lim yy by the new problem of finding lim y ' provided this latter limit exists 

X“*C g(XJ x->c g fxj 

or is +oo or — oo. Any legitimate methods may be brought to bear on the new 
problem, including algebraic or trigonometric transformations of the problem, 
breaking the quotient up into a product of simpler quotients, repeated use of 
Theorem VI, etc. 


n ^ t-i * j i * (e — 1) sin x 

Example 3 . Find lim — . 

^ocosx-cos r 

We first apply Theorem VI: 

.. (e*-l)sinx .. (e x - 1) cos x + e x sin x 
jc^o cos x - cos x — sin x + 2 cos x sin x. 

Next we simplify the new quotient: 

(e x - l)cos x + e x sin x _ cos x e x - 1 | e x 
- sin x + 2 cos x sin x 2 cos x - 1 sin x 2 cos x-T 


Now 


and 


lim 

x^O 


COS X 

2 cos x - 1 


1 . r 1 

2-1 ’ x^2cosx-l 2-1 


e x -\ e x 1 

lim — - = lim — — = 7 = 1. 
x^o sin x x->o cos x 1 

Hence the original limit is equal to 2 . 

The next example illustrates Case 1 with c = + ». 

x + 1 

Example 4 . Find lim x log r. 

x-H-00 x 1 


Here we write 


x log 


x + 1 
x — 1 


log 


X + 1 

x- 1 


1 ’ 


X 
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JC “t" 1 1 

taking /(x) = log - -j-, g(x) = — . It is clear that Theorem VI is applicable. Thus 


lim 

jC-»+oo 


log 


X + 1 


1 lx 


li m log(x + 1) - log(x - 1) 

x-*+eo 1 /X 


_J 1_ 

X + 1 x — 1 
= I'm — TiTP — • 

X-»+oo 1/A 

On simplification, the new limit takes the form 

2x 2 

lim —j~~t = 2, 

X^+ocX -1 

so that the original limit has the value 2. 

Next we illustrate Case 2 of Theorem VI. 
c x 

Example 5. Find lim — jg. 

jc-»+oo X 

Case 2 of Theorem VI is applicable, but it must be applied ten times before 
we reach a result. We have 


6 

lim -re = lim 

X->+oo X x-»+oo 


e x 

10x 9 


lim 

JC-»+a> 


e x 

10 • 9x 8 


lim =+oo 

", 10! 


It might at first sight appear as though we could solve the problem of 
Example 5 by use of Case 1. For we can write 



and if /(x) = x l0 , g(x) = e \ the conditions of Case 1 are satisfied. But when we 
differentiate, things get worse instead of better: 


x 

lim — — = lim 

X ->+<*■ € X -»+00 


- 10x~" 
- e~ x 


lim 

*-»+<* 


11 • 10x~ 12 


As one of the very interesting applications of l’Hospital’s rule, let us discuss 
the function e ~ llx 2 . 


Example 6. Let us define f(x) = e~ l/x2 if x^O, and /(0) = 0. Then all the 
derivatives of /(x) have the value 0 at x = 0, as we shall now show. 

In the first place, / is continuous at x = 0, for /(x)->0 as x.-*0 (see Example 
6, §1.1). Now, by definition, 


f(0) = lim 

x->0 


m-m 

x-0 


= lim 

x ^0 



This appears to be a place to use Case 1 of Theorem VI, but the application of 
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the method gives 


, — l/x 2 


lim 

x-0 x 


= lim 

x->0 


2x 3 e 11x2 

1 


= lim 

x->0 


2e-' lx2 


and the new limit is harder to deal with, rather than easier. We may, however, 
use Case 2 of Theorem VI, as follows: 


■l/x 2 


lim 

x->0 x 


= lim 

x-+o e 


1 lx 


= lim 

x->0 


— — = lim \xe 11x2 = 0. 
_A^1/x 2 x ~* () 

X 3 


This shows that /'(0) = 0. 

To deal with higher derivatives we observe that, if x¥- 0, 

/'(*) = 2x"V ,/jc2 , 


_ A V -6^-1/x 2 


f"(x ) = 4x e 


6x~\ 


■l/x 2 


It is easy to see, by induction, that / (n) (x) is a linear combination of terms of the 
form 

e~ llx2 lx m 

with 0 < m ^ 3 n. Consequently, to see that / (n) (0) = 0, as well as to see that f (n \x) 
is continuous at x = 0, it will be sufficient to show that 

i* 2 


lim — ; 

x-0 X' 


0 


(4.5-7) 


for all positive integers m. We prove (4.5-7) by repeated use of Case 2 of 
Theorem VI: 

. —m+ 2 


,. x .. - mx m .. x 

hm -T7 ji = lim ~ 3 i i x 'i = y hm 

x-*-0 C x-»0 ZX € Z x-»0 C 

after a finite number of steps the exponent in the numerator will be positive, and 
then the limit is seen to be 0. 

The graph of y = e~ llx2 is indicated in Fig. 27. It is very 
flat (but of course not perfectly straight) in the neighbor- 
hood of the origin. 

There are various kinds of functions whose limiting 
values may be found by using suitable devices to bring the 
problem to a form where FHospital’s rule is applicable. The 
principal types of problems and the appropriate devices are 
indicated in Exercises 3 and 4. 


y=e 



EXERCISES 

1. Evaluate the following limits: 


, . sin 3x 

(a) lim -r -r~ 

v x ^ 0 + 1 - cos 4x 


, , ,. 1 , 1 + x 

(c) hm — log — - 

x— *0 A 1 A 


(b) lim 


tan x - sin x 


(d) lim 


x sin x 
‘o (1 - cos x) 2 


(e) lim 

x— *0 


tan x — x 


(f) lim 


log(l + e~ x ) 
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x 3 (log xf 


2. Evaluate the following limits: 

(a) lim — >n a positive integer. (d) lim 

x— *+°° € x -*•+<*> t 

_ (logx)" ..... , . log(l + e 2 *) 

(b) lim — s — 1 — ’ n a positive integer, (e) lim — 

X-* + °° X X-*+“> X 


(c) lim 


log x 
+® \/ 1 + x 3 


(f) lim 


log(l + xe 2x ) 


3. One sometimes wants to find the limit of a function of the form y = [F(x)] as x 
approaches some limit x 0 , or as x ^°o. In such problems F(x) is positive. If F(x) and G(jc) 
both approach nonzero limits, the problem presents no unusual difficulty. There are, 
however, three cases in which the limit of y is not apparent without some investigation: 

Case 1. F(z)-> 1 and G(jc)^q°; 

Case 2. F(x)^<» and G(x)^0; 

Case 3. F(x)^0 and G(x)-^0. 

In these cases it is usually appropriate to investigate the limit of the logarithm of y: 

log y = G(x) log F(x) = ‘ 

The rule of l’Hospital may then be applicable. If log y -> b, then y 
Use this procedure to evaluate the following limits: 


e . 


(a) lim x 

x-*0+ 

(b) lim jc 


1/x 


< d > “■?.(' + i ;)"• 

(e) lim (e x + e 2 *) 1 '*. 


(c) limf^L] . (f) lim (logy) 

x — *0 \ X / x— >0+ \ X J 


4. Sometimes a limit is not easy to determine because it is of the type lim x ^ XQ (f(x) - 
g(jt)), where f(x ) and g(jc) both become infinite as x^>x 0 (or as In practice the 

best plan for such cases is usually to bring the expression /(x)-g(x) to the form of a 
single quotient, e.g.. 


lim(— — ~ — ,N j = lim S * n * 
x^oVx sinjc/ x^o x si] 


- x 
sin x 


The limit may then be treated by l’Hospital’s rule, or by devices of algebraic or 
trigonometric reduction. In some cases an algebraic device alone will be sufficient, e.g., 

(x ~ VFTT)(jc + VF+T) 


lim (x - V* 2 + 1) = lim 

X-»+<» X-* + “ 

= lim 


JC + Vji 2 + 1 


-1 


Evaluate the following limits: 

(a) limf- — r~ — Y (d) lim(: — ,} r 

x^o\x sinx/ x^o\log(l + x) 

(b) lim( — J V). (e) lim(-^- \ Y 

x^oVxsinx x J x-o\x tan xj 

(c) lim f— — log — Y (f) lim x(Vx 2 + a 2 -x). 

X->0+ \X X / X-»+o O 


X + Vx 2 + 1 

3- 


= 0 . 
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5. Suppose that / and g have continuous derivatives of the first n orders in a closed 
interval a^x^b. Furthermore, assume that f(a) = f'(a) = * • * = / (n_,) (a) = 0, g(a) = 
g'(a) = ■ * * = g ( " _1) (a) = 0, and that g (n) (a) ^ 0. Without using PHospital’s rule, show that 


lim 


M r\a) 

g« g <n V)' 


Use Taylor’s formula with remainder. 


fW 


6. What can you conclude in Theorem VI, if lim J { fails to exist either as a definite 

g(x) 

numerical limit or as +oo or — oo? Give your answer after a careful examination of the 
limits 


jc 2 sin — 

lim — t and 

jc-»o sin x 


lim 

X — 


x - sin x 
x 


7. Evaluate each of the following limits: 

1 1 ( x 
(a) lim x sin— ■ (c) lim — sin 2 1 dt. 

jC-»oo X JC-++OG X J 1 


(b) lim — 

x->+® x 


fMogt 

J. l + t 


dt. (d) lim 

X— O 


e -(1 + x) 


l lx 


8. Give a proof of Case 1 of Theorem VI, assuming that c is a real number, and 
utilizing the following suggestions: Let b be a point of the interval, J, and consider the 
functions /, g on the closed interval with end-points b and c, after defining /(c) = g(c) = 

0. Now apply Theorem I, §4.1, and make the deduction of (4.5-2) from (4.5-1). Explain 
the argument carefully. Why do we define /(c) = g(c) = 0? 


MISCELLANEOUS EXERCISES 

1. Suppose that / is defined and differentiable in an interval containing x = a (the 
point x = a may be at one end of the interval, in which case derivatives at x = a are to be 
considered as one-sided limits). Suppose also that /"(«) exists, but assume nothing else 
about second derivatives. Show that 


f"(a ) = lim . 

(x - a) 

2! 


2. Suppose that / satisfies the conditions of Exercise 1, and that x = a is an interior 
point of the interval in question. Show that, if / has a relative minimum at x = a, then 
/"(a)^0, while /"(a)^0 if / has a relative maximum at x = a. These are necessary 
conditions for a relative extreme at x = a. Now assume that f(a) = 0 and f"(a ) > 0, and 
prove that / must have a relative minimum at x = a. These are sufficient conditions for a 
relative minimum. State a set of sufficient conditions for a relative maximum at x = a, and 
prove the sufficiency of the conditions. 

3. Generalize the result of Exercise 1, obtaining 


/(x) — /(a) — (x — a)f(a) 


/ (n) (a) = lim 


(x-ay 

in- l)j 


/ (n -» 


(x ~ aY 


n ! 
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with the assumptions that / is defined and has derivatives of orders 1, . . . , n - 1 in an interval 
including x = a, while / (n) (a) exists. Suggestion: Apply Theorem VI (n - 1) times. Use this 
result to show that there is a function E n (x) defined in the interval, except at x = a, such that 


/(*) = f( a ) + (x — a)f'(a) + • 


(jr-fl )"- 1 f(n . 
(n - 1)! 1 


Xa) + 


(x-a) n 


n ! 


{f' n \a)+E n (x)} 


and lim x ^ fl E n (x) = 0. The function E n (x) depends upon the function f(x), of course. Find 
E 2 (x) explicitly if /(x) is defined as x 4 sin(l/x) when xt* 0, and /(0) = 0; take a = 0. 

4. With the conditions of Exercise 3 on / suppose that f(a ) = • • • = / <r,_l) (a) = 0 and 
that / has a relative minimum at x = a. Also suppose that n is even. Show that f (n \a) ^ 0. 
What if there is a relative maximum at x = a? What about sufficient conditions for a 
relative maximum or minimum when f(a ) = • • • = / ( " _,) (a) = 0? What can you say if n is 
odd, f'(a) = • • ■ = / (n-,) (a) = 0 and } (n \a ) # 0? (Assume that x = a is an interior point of 
the interval.) 

5. Use the result of Exercise 3 to prove the following theorem: 

Let f and g be defined and have derivatives of orders 1, . . . , n — 1 in an interval 
including x = a, while f (n) (a ) and g (n) (a ) exist and g (n) (a)^ 0. Suppose that both f and g, 
as well as their first n — 1 derivatives , have the value 0 at x = a. Then 




This theorem holds for n ^ 1. It may give a result when Theorem VI is inapplicable, 
e.g., in the case /(x) = x 2 sin(l/x), /(0) = 0, g(x) = x. 

6. Calculate the following limits: 


(a) lim x(Jogx) n , n a positive integer, (b) limx 1/(1 x) . 

-*-»0+ jc — > 1 

7. Discuss lim^+oc (a x + b x )' lx , assuming 0<a^b. Generalize to the case of 
lim x ^ +00 (a? + a\ + • • • + a x n ) ylx , where 0 < a, ^ a 2 =••■ = £!*■ 


8. Find lim 

x -*0 


X Jo* e t2 dt 
1 - e * 2 


9. If /(x)= e~ Ux , show that / (n) (x)= e l,x P^fl/x), where P„(U is a polynomial of 
degree 3n in t, with leading term 2V". Show also that P„+i(t) = 2t 3 P n (t)- t 2 Pn(t). 

10. Show that to each positive integer n corresponds a number 6 n , 0< 6 n < 1, such 

that log(l +£)-£-£• 

11. Apply (4.3-7) to /(x)=l/(l-x) with a = 0, h = x, and find the form of the 
remainder term. Then compare with the algebraic indentity 


1 -x 


= l + x+- • • + x" + 


1 X 


and so find that 0 = 


1 - 0 -*) 


l/(n + 2) 


(where x < 1). Now find lim 6. 


12. Suppose that / is differentiable in an interval of which x = a is an interior point, 
and assume that f"(a ) exists. Show that 

/"(a) = lim f(“ + 2h)-2f(a + h l +m . 

h-*o n 


This may be proved using the result of Exercise 1. If one assumes that the second 
derivative exists on the whole interval and is continuous at x = a, the proof may be given 
by using (4.3-7) in a suitable way. 



5 / FUNCTIONS OF 
SEVERAL VARIABLES 


5 / FUNCTIONS AND THEIR REGIONS OF DEFINITION 

Thus far in this book we have dealt with functions of a single independent 
variable. But we do not go far in either pure or applied mathematics until we 
have occasion to consider functions of two or more variables. We assume that 
the student has some familiarity with the concept of a function of several 
independent variables. 

One of the first things that claims our attention when we begin to study 
functions of several variables is the nature of the region of definition of such a 
function. The functions of one variable which we study in calculus are usually 
defined on intervals of the real axis. There are only a few different types of 
intervals. If the interval is finite, it may contain both its end-points, or just one, 
or neither. If the interval is infinite, but is not the entire axis, it has just one 
end-point, and this may or may not be counted as belonging to the interval. 
There is much more variety in the case of functions of several variables. We 
shall give some illustrative examples, taking the number of independent vari- 
ables to be two. 

Example 1. f(x , y) = log(l - x 2 - y 2 ). 

The function is defined only when x 2 + y 2 < 1, since otherwise the logarithm 
is undefined. The region of definition is the interior of the unit circle with center 
at the origin. In Fig. 28 the circle is dashed to indicate that the boundary of the 
circular area does not belong to the region of definition. 




Example 2. F(x , y ) = Vx 2 + y 2 - 1 + log(4 - x 2 - y 2 ). 

Here we must have x 2 + y 2 ^ 1 in order for the square root to be real, while 
we must have x 2 + y 2 <4 for the logarithm to be defined. The region of definition 
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of F(x, y) is the annular region between the circles x 2 +y 2 = 1 and x 2 +y 2 = 4. 
The inner circumference is part of the region of definition, while the outer 
circumference is not (see Fig. 29). 

Example 3 . g(x, y) = 

The function is defined except when the denominator is zero, that is, 
everywhere except at the points of the parabola y 2 = Ax (see Fig. 30). 



Fig. 30. Fig. 31. 


Example 4. G(x , y) = Vx 2 - y 2 + Vx 2 + y 2 - 1. 

The region of definition here is defined by the inequalities x 2 ^ y 2 , x 2 + y 2 ^ 1. 
The lines x-y = 0, x + y= 0 divide the plane into four quadrants. The inequality 
x 2 ^ y 2 states that the point (x, y) lies in (or on the edge of) one of those two of 
the four quadrants which contain the x-axis. The other inequality states that 
(x, y) lies outside or on the circle x 2 + y 2 = 1. Hence the region of definition of 
G(x, y) is that part of the xy-plane which is shaded in Fig. 31. 

Similar examples might be given for functions of three independent vari- 
ables. The region of definition might be the interior of a cube, the interior and 
boundary of an ellipsoid, the space between two concentric spheres, or the 
interior of a surface formed like the inner tube of a bicycle tire. 

Because of the great variety of possible regions of definition of a function of 
two or more variables, it is desirable to devote some attention to matters of 
terminology about configurations of points in the plane. Not only will this make 
it easier for us to state things clearly, but it will eventually become absolutely 
indispensable in developing parts of our subject. We shall sometimes use the 
word “domain” for “region of definition,” and by the range of a function we 
shall mean the set of values which the function takes on. 

5.1 / POINT SETS 

In §2.7 we explained the meaning of the phrase “a set of real numbers.” Since 
we identify real numbers with points on the axis of reals, we may equally well 
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speak of sets of points on a line. We are now going to talk about sets of points, 
or point sets, in the plane or in space of three dimensions. For the sake of 
simplicity and definiteness we shall speak principally of point sets in the plane. 

In dealing with intervals we found it necessary to introduce the notions of 
closed and open intervals. Likewise, in our discussion of regions of definition of 
a function of several variables, we find it necessary to introduce and use the 
concepts open set and closed set . Furthermore, it is well to give a precise 
definition of the boundary of a set. It is much more difficult to define these 
concepts carefully than one might at first suppose, because of the great variety 
of configurations of points which are available for consideration. We proceed as 
follows: First let us define the phrase a circular neighborhood of the point 
(x 0 , y 0 ) to mean the set of all points (x, y) lying inside some circle with center at 
(x 0 , yo). That is, if 8 > 0, the set of all (x, y) such that 

(x - Xo ) 2 + (y - y 0 ) 2 < s 2 

constitutes a circular neighborhood of (xo, yo) (see Fig. 32). A 
neighborhood may have any positive number as its radius. The 
further concepts which we are now going to introduce are built 
upon the concept of a neighborhood. It is convenient to abandon 
the coordinate notation for the time being, and denote points by 
such symbols as P, Q, P u P 2 , 

Definition . A set S is called open if each point P of S has some 
circular neighborhood which belongs entirely to the set S. 

Example 1. A circular neighborhood, for instance the set of 
all points inside the circle x 2 + y 2 = 1, is an open set. For, if P is 
inside the unit circle, say at a distance r from the center (where 
r < 1), the circular neighborhood of P of radius 8 also lies inside 
the unit circle provided 8 is chosen so that 0 < 8 < 1 - r (see Fig. 

33). 

Example 2. The set of all points in the plane not on the parabola y 2 = 4x is 
an open set. The student should verify this for himself. 

Example 3 . Let S consist of all points on or inside the circle x 2 + y 2 = 1. This 
set is not open, for if P is on the circle there is no neighborhood of P which 
belongs entirely to S. 

Definition . If S is any point set , the set of all points of the plane which are not in 
S is called the complement of S. On occasion it is convenient to use the notation 
C(S) for the complement of S. 

To illustrate this notion, consider the sets defined in the foregoing examples. 
For the S of Example 1, C(S) is the set of all points outside or on the circle 
x 2 + y 2 = 1, i.e., all points for which x 2 + y 2 ^ 1. In Example 2, C(S) consists of 
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the points on the parabola y 2 = 4x. In Example 3, C(S) is the exterior of the 
circle, i.e., all points such that x 2 + y z > 1. 

Definition, A set S is called closed if its complement is open . 

The set of Example 3 is closed (the student should verify this to his own 
satisfaction); the sets of Examples 1 and 2 are not closed. A set consisting of any 
finite number of points is closed. A set may be neither open nor closed, as we 
see in the next example. 

Example 4. Let S be the set of all points for which 1 ^ x 2 + y 2 < 4. This set is 
the region of definition of the function F(x, y) of Example 2, §5. It is not open, 
because a point of the circle x 2 +y 2 =l has no circular 
neighborhood which belongs entirely to S (see Fig. 29). The y 

complement of S has two parts: the set of all points for which 
x 2 + y 2 < 1, and the set of all points for which x 2 + y 2 ^4. It is 
easily seen that C(S) is not open, for a point on the circle 
x 2 +y 2 = 4 has no circular neighborhood which belongs 
entirely to C(S). Therefore S is not closed. The set C(S) is 
shown in Fig. 34. 

If S is a set, the complement of C(S) is S itself. Hence, 
by definition, C(S) is closed if S is open. Thus, if one of the Fig. 34. 
two sets S, C(S) is open, the other is closed. 

Definition. If S is a point set , a point P is called a boundary point of S if every 
neighborhood of P contains at least one point of S and one point of the 
complement C(S). The collection of all boundary points of S is called the 
boundary of S. We denote it by B(S). 

The sets introduced in Examples 1-4 have the following boundaries: 

Example 1. B(S ) is the circle x 2 + y 2 = 1. 

Example 2. B(S) is the parabola y 2 = 4x. 

Example 3. B(S) is the circle x 2 + y 2 = 1. 

Example 4. B(S) consists of the two circles x 2 + y 2 = 1 and x 2 -I- y 2 = 4. 

It is clear from the definition of boundary that a set S and its complement 
C(S) have the same boundary. If a set S is open, no boundary point of S is 
actually in S. If S is closed, B(S) is part of S. These statements may be verified 
by referring to the definitions. In Example 4, B(S) is partly in 5 and partly in 
C(S). 

Definition. A point P of a set S is called an interior point of S if there is some 
circular neighborhood of P which belongs entirely to S. The interior of a set S is 
the set consisting of all interior points of S. 


/ 
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An open set consists entirely of interior points. A set may have no interior 
points. 

We are going to be most interested in sets of points which are of one of the 
following kinds: 

(а) Open. 

(б) A closed set formed by an open set together with its boundary. 

(c) The boundary of an open set. 

Furthermore, the sets which we consider will usually have boundaries which are 
composed of one or more curves or segments of curves (straight lines included), 
or possibly of isolated points. 

Definition . By a region we shall mean a set of points which is either a nonempty 
open set or such a set together with some or all of the points forming its 
boundary. 


All the sets occurring as domains of definition of the functions in Examples 
1-4 of §5 are regions in the sense just defined. Of these regions, those of 
examples 1 and 3 are open; that of Example 4 is closed (i.e., the entire boundary 
is part of the region); that of Example 2 is neither open nor closed (it contains 
only one of the two circles forming the boundary). 

If we wish to deal with point sets in three-dimensional space we define 
spherical neighborhoods instead of circular neighborhoods. Once this is done 
we can define open set, closed set, and boundary just as in the case of point sets 
in the plane. These concepts also apply to point sets on a line. In that case we 
define a neighborhood of jc 0 as an open interval x 0 - 8 < x < 
x 0 + 5, where 5 is any positive number. 

Finally, we remark that we could use square neighbor- 
hoods instead of circular neighborhoods without affecting 
the whole development of the concepts of open sets, closed 
sets, boundary of a set, and so forth. A square neighborhood 
of (x 0 , yo) is the set of all points inside a square with center 
(x 0 , yo) (see Fig. 35). If the square has sides of length 25 the 
neighborhood is defined by the inequalities 

I* - *o| < s, |y - y 0 | < 8- 


y 



In the future we shall often speak of neighborhoods without the adjectives 
circular or square. It will usually be immaterial which is meant or which the 
student chooses to think of on any particular occasion. 

Recall that the union of any collection of sets consists of all points belonging 
to at least one of them, and the intersection consists of all points belonging to 
each of them. Two sets are said to be disjoint in case they have no points in 
common. 


Example 5. An open rectangle is a rectangle without its boundary. It consists 
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entirely of interior points. For a rectangle with each side parallel to a coordinate 
axis, to say that R is open means that there are number pairs a, b with a < b and c, d 
with c <d such that R is the set of all points (x, y) for which a <x <b and 
c<y<d. The same thing is frequently said more briefly as follows: 

{R = (x, y) : a <x < b and c<y<d}. 

A property of open rectangles which we shall find useful later in proving the 
inverse function theorem (in Chapter 12) is brought out in the following: 
Assertion: No open rectangle R is the union of two nonempty disjoint open 
subsets. 

When we undertake to justify this very plausible assertion, we see that the 
key to a proof is a good understanding of the property of openness. Let us reason 
by contradiction and begin by supposing that R is the union of two nonempty 
disjoint open subsets, A and B. Now consider the line segment [PQ] connecting 
a point P in A to a point Q in B. Since R is an open rectangle, the line segment 
[PQ] obviously lies entirely in R. Since the subset A to which P belongs is 
open, there is some positive number r such that the disc of radius r centered at 
P is contained in A. Therefore there is some interval extending along [PQ], from 
P toward Q, which lies in A. Since B also is open, the same argument shows 
that all points of the line segment [PQ] which are sufficiently close to Q must, 
like Q itself, lie in B. 

Now let D be the set of all numbers which are distances from P of points on 
[PQ] which belong to A. Then D is a set of nonnegative numbers which is 
bounded above, since the distance from P to Q is obviously one upper bound. 
By the least upper bound property of the real numbers (§2.7), D must have a 
least upper bound; call it d. By the preceding paragraph, we know that d is a 
positive number less than the distance from P to Q. From now on, we shall 
concentrate our attention on that point C of [PQ] which is at the distance d 
from P. Obviously C must belong to R, yet we shall soon see that it cannot 
belong to either A or B. This contradiction will prove the Assertion. 

Suppose first that C belongs to A. Then C must be farther than any other 
point of A, that is also in [PQ], from the point P. Since A is open there must be 
some disc centered at C and contained in A. But this implies that there are 
points of A on the segment [PA] which are farther than C is from P, which is a 
contradiction. Now try assuming that C belongs to B. Clearly no point between 
C and Q can belong to A. Since B is open, there is some disc of positive radius 
p centered at C and contained in B. But this would imply that d - p is an upper 
bound for D, contradicting the fact that d is the least upper bound. This 
completes the proof. 


EXERCISES 

1. The set S consists of all points (x, y) such that x 2 +y 2 <l and x<0 if y = 0. 
Describe S in geometrical language, with the aid of a figure. Is S open, closed, or neither? 
What is the boundary of S? 
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2. The set S consists of all points (x, y) such that either x 2 +y 2 = 1 or y = 0 and 
O^x ^ 1. Does this set have any interior points? Is it closed? 

3. The set S consists of all points (x, y ) such that y ^ x 2 and y ^ 1. Draw a figure, and 
describe S in geometrical language. Is S open, closed, or neither? What is the boundary 
of S? 


4. The set S consists of all points (x, y) such that 0 <xy ^ 1 and x >0. Is this set 
open, closed, or neither? What is B(S )? 


5. The set S consists of all points (x, y) such that y = sin(l/x) and x >0. Does this set 
have any interior points? It is closed? What is B(S)? 

6. The set S consists of all points (x, y) for which x 2 + y 2 < 4 and y > 0 except for the 
points with 0 < y ^ 1 and x = 1/n, n = 1, 2, . . ., i.e., except for the points of a certain 
infinite sequence of line segments each one unit long. Is this set S open? What is the 
boundary of S? Are there any points of B(S ) which are also in 5? Is S a region? Is its 
complementary set C(S ) a region? 


7. 


Let /(x, y) - 


(y 



the function being defined whenever this expression 


has a meaning, but not otherwise. Is the set of points where / is defined a region? What 
is the boundary of the set? 

8. Let /(x, y) = log sin x + y~ 1/2 , the function being defined whenever this expression 
has a meaning (real numbers only are to be considered). Describe, with the aid of a 
diagram, the set of points (x, y) where / is defined. Is the set open, closed, or neither? Of 
what does its boundary consist? 


5.2 / LIMITS 

We wish to define what is meant by the statement “/(*, y) approaches A as a 
limit when the point (x, y) approaches (x 0 , y 0 ).” The statement is written in the 
form 


lim f(x,y) = A. (5.2-1) 

<*, y)->(jc 0 , y 0 ) 

In giving the definition we shall assume that the function / is defined in a region 
R and that (x 0 , y 0 ) is either an interior point of R or on the boundary of R. The 
point (x 0 , y 0 ) may, but need not, belong to R. If it is in R, the meaning of (5.2-1) 
has nothing whatever to do with the value /(x 0 , yo) at the point (x 0 , y 0 ). The 
statement (5.2-1) is now defined to mean that if e is any positive number, there is 
some neighborhood of (x 0 , y 0 ) such that if (x, y) is in the neighborhood, in R, and 
different from (x 0 , yo), then |/(x, y) - A| < e. This definition may be compared 
with that for functions of one variable in §§1.1, 1.61. The limit notion can be 
expressed verbally as follows: The meaning of (5.2-1) is that f(x,y) is in a 
prescribed neighborhood of A on the real axis provided (x, y) is any point other 
than (x 0 , yo) in a suitably chosen (sufficiently small) neighborhood of (x 0 , yo) in 
the plane. 

With the adoption of the term neighborhood we obtain a unification of the 
limit concept for functions of one, two, or three independent variables. The 
extension to more than three variables causes no trouble and involves no new 
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principle. We continue to use geometric language; the meaning of a “spherical 
neighborhood” in a space of four variables is made clear by the inequality 

(x - x 0 ) 2 + (y - y 0 ) 2 + (z - z 0 ) 2 + (w — w 0 ) 2 < 8 2 . 

The fundamental theorems about limits carry over to functions of several 
variables. We cite particularly Theorems X (§1.61) and XIV (§1.64). 

When we say that f(x, y)-» A as (x, y)->(x 0 , yo), it must be stressed that the 
limit must exist and be the same , no matter how (x, y) approaches (x 0 , y 0 ). The 
student will recall that, for a function of one variable, /(x)->A asx^x 0 means 
that /(x)-»A as x-»x 0 + and also as xh>x 0 -. But in the case of two variables, 
(x, y) can approach (x 0 , yo) in a infinite number of ways. If it is possible to find 
two different modes of approach to (x 0 , yo) such that /(x, y) approaches different 
limits in the two cases, or no limit at all in at least one of the cases, then 
lim (x>y) _ (JCo?yo) /(x, y) does not exist. 

x 2 — y 2 

Example 1. Let /(x, y) = — ■— This function is defined except at the origin. 

x “r y 

Let us show that the limit of /(x, y) as (x, y)-»(0,0) does not exist. If (x, y)-> 
(0, 0) along the x-axis, we have /(x, 0) = l(x^ 0). If (x, y)->(0, 0) along the y-axis, 
we have /( 0, y) = -1 (y ^ 0). Thus the limits for the two modes of approach are 1 
and -1 respectively. This shows that /(x, y) has no limit as (x, y)^(0, 0). 

To prove directly that a certain function approaches a certain limit as 
(x, y) (x 0 , yo), we have to work with inequalities. The following example will 
illustrate the technique. It is not our intent, at this stage of a student’s training, to 
have him cultivate extensively the technique of working exercises of the type 
represented by the example. The purpose is merely to make clearer the essential 
content of the definition of a limit. 


Example 2 . Show that 


lim 


(x, y)->(0, 0) 


2x 3 - y 3 
x 2 +y 2 


= 0 . 


(5.2-2) 


In terms of inequalities, this means that if e is any positive number, we have to 
show that another positive number 8 (depending on e) can be found, such that 


2x 3 — y : 


x 2 +y 2 


< e if 0 < (x z + y 2 ) 1/2 < 8 ; 


(5.2-3) 


in other words, denoting the function under consideration by /(x, y), we have to 
show that, if e>0, there is some circular neighborhood of the origin (whose 
radius we denote by 8) such that |/(x, y)-0|<e if (x, y) is in the specified 
neighborhood of the origin but not actually at the origin. We proceed to find such 
a number 8, considering e as given. \ 0 4 w 

Now |2x 3 - y 3 | ^ 2|x| 3 + |y| 3 = 2|x|x 2 + |y|y 2 . 


|x | = (x 2 + y 2 ) ,/2 and |y| ^ (x 2 + y 2 ) 1/2 . 


Also, 
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x 2 + y 2 


^ 2(x 2 + y 2 ) 1/2 if 0 < x 2 + y 2 . 


It is now clear that (5.2-3) will hold if 8 is chosen in any manner such that 
0 < 8 ^ e/2. Thus (5.2-2) is proved. 

In work of this kind the student will find the simple inequalities 

-t b\ ■>. N+M 2\ab\ Sa 2 + b 2 , (5.2-4) 

-b\ I Ol |a| + |b| S V2(a 2 + b 2 ) m (5.2-5) 

quite useful. See Exercise 6 for remarks about these inequalities. 


EXERCISES 

x 2 — y 2 

1. Find the limit of 2 ~v " 2 as (x, y) approaches (0, 0) along the line y = x; along the 
x “t y 


line y = mx. 

2. Does 


lim 


*y 


( X , jho,o) x + y 

3. Examine the behavior of 


exist? Give reasons. 


4 4 

x y 


(x + y ) 


3 as (x, y) approaches (0, 0) along various 


straight lines. Then consider what happens for approach to the origin along the curve 
y 2 = x. Is there a limit as (x, y)^(0, 0) without restriction? 

4. Show in each case that the given function does not approach a limit as 
(x, y ) — > (0, 0), by examining the behavior of the function for at least two modes of 
approach. 

* -JL * 2 +y 


(a) 


(b) 


x 2 + y 2 ' 
xy 2 


(c) 


(x + y ) 


2x1/2- 


/1X x 4 + 3x 2 y 2 + 2xy 3 
(d) - (x 2 "+ yV ■ 


x + y 

5. Define a function by setting f(x, y) = 0 if y ^ 0 or if y ^ x 2 , and /(x, y) = 1 if 
0 < y < x 2 . Show that /(x, y) -> 0 as (x, y) -» (0, 0) along any straight line through the origin. 
Find a curve through the origin along which f(x, y) = 1 (except at the origin). 

6. If A and B are nonnegative numbers, the inequality A^B is equivalent to 
A 2 ^B 2 . Use this fact to prove the correctness of (5.2 — 4); then show that (5.2-5) is 
correct. 

7. Let f(x, y)~ x y( ^2 + " ^)- Show that |/(x, y)| ^|(x 2 + y 2 ), and hence prove that 

f(x, y) approaches a limit as (x, y)^-(0,0). 

2 2 

8. Let f(x, y) = -* y ~- If e > 0, find 8 so that 0 < (x 2 + y 2 ) ,/2 < 8 implies |/(x, y)| < e. 

x r y 

9. If e > 0, show that |2x 2 - 6xy + 5y 2 | < e when (x 2 + y 2 ) 1/2 < (e/13) 1/2 . 

x 4 + v 4 

10. Show that ~ ^ < e if 0 <x 2 + y 2 < S 2 , for a suitably chosen 8 depending on e. 
x "(■ y 


e ( y in Or‘f ) 


.. o 


sVquj 


1 7 
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11. Show that |x 3 - y 3 | ^ (x 2 + y 2 ) 3/2 . 

12. Does y - approach a limit as (x, y)-»(0, 0)? 

5.3 / CONTINUITY 

The notion of continuity depends on the notion of limit, as was pointed out at 
the beginning of Chapter 3. 

Definition . Let f(x , y) be defined in a region R, and let (x 0 , yo) be a point of R. We 
say that f is continuous at this point if 

=/(*<<’*>)• 

If (x 0 , yo) is an interior point of R, the mode of approach of (x, y) to (x 0 , yo) is 
unrestricted in this definition. But, if (x 0 , yo) is a boundary point of R, there is the 
restriction that (x, y) must remain in R. We say that / is continuous in R if it is 
continuous at each point of R. 

If / and g are defined in the same region R, and each is continuous at a point 
(x 0 , yo) of R, then the sum and product functions 

f(x,y) + g(x,y), f(x,y) g(x, y) 

are also continuous at (x 0 , yo). The quotient function 

f(x, y ) 

g(x,y ) 

is continuous at (x 0 , y 0 ) provided g(x 0 , yo) ^ 0. These assertions are direct 
generalizations of Theorem I, §3. They may be extended to functions of more 
than two variables. 

The theorem of Chapter 3 all have important analogues for functions of 
several independent variables. We do not wish at this point to prove all these 
analogous theorems, but we shall discuss the statements of certain theorems 
which will be used in the chapters immediately following. 

In dealing with the analogues of Theorems II (§3.1) and III (§3.2) of Chapter 
3 it is necessary to introduce the concept of a bounded point set. 

Definition . A point set S in the plane is called bounded if all its points are inside 
some sufficiently large circle. For a point set in space the definition is similar; we 
write “ sphere ” instead of “circle” 

Examples . The interior and boundary of a triangle form a bounded point set. 
The set of all points between the lines y = 0, y = 1 is not a bounded point set. 

We now state two important theorems. 

THEOREM I. If a function is continuous at each point of a closed and bounded 


y 
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region R, the function is bounded on the region (i.e., the values of the 
function form a bounded set of real numbers). 

THEOREM II. Let f be continuous on a closed and bounded region R . Let m 
and M be the greatest lower bound and least upper bound of the values of f 
on R. Then f takes on each of the values m, M at least once in R. 


Proofs of these two theorems will be considered later, in §§17.2, 17.3. 
Theorem V of Chapter 3 (§3.3) has the following analogue. We state it for 
the case of two independent variables. 


THEOREM III. Let f be defined in an open set containing the point (x 0 , yo). 
Suppose that f is continuous at the point and that /(jc 0 , yo) ^ 0. Then there is 
a neighborhood of (x 0 , y 0 ) throughout which /( x, y) has the same sign as at 
(*o, yo). 

The proof is left to the student. 

There is a feature of the continuity of a function /(jc, y) which deserves 
notice. If we fix y, say y = y 0 , /(jc, y 0 ) is a function of jc alone. Likewise /(x 0 , y) is 
a function of y alone. It can happen that each of these functions of a single 
variable is continuous, and yet that /(jc, y) is not continuous. An illustration of 
this possibility is given in Exercise 3. 

There is another theorem which will be needed later. It deals with composite 
functions, and may be roughly stated in the form: A continuous function of 
continuous functions is continuous. The number of variables is immaterial. 

Examples . F(z) = sin z is a continuous function of z, and /(x, y) = (1 + xy) 2 is 
a continuous function of x, y. Therefore 

F(f(x, y)) = sin(t + xy) 2 

is a continuous function of x, y. Or, again, 

F(x, y, z) = x 2 + y 2 + z 2 and /(x, y) = x(l + x 2 + y 2 )“ 3/2 

are continuous functions of x, y, z and x, y, respectively. Therefore 

x 2 

F(x, y ,f(x, y)) = x 2 + y 2 + (1+x 2 +y y 

is a continuous function of x, y. 

We formalize one such theorem about composite functions. 




THEOREM IV. Let F(x, y, z) be continuous in an open set B of space . Let 
/(x, y) be continuous in an open set R of the xy -plane. Writing z = /(x, y), 
suppose that the point (x, y, z) is in B when (x, y) is in R. Then the composite 
function F(x, y,/(x, y)) is continuous in R. 

dr% OK T) t 

wC V 1 .va'O 

P (.A 


1 






Ca 


r/9 ) „ 




a- 


« a (2-r 
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We shall not give a proof here. This theorem is a special case of the 
Theorem II which is proved in §11.7. 

EXERCISES 

1. If fix, y) = * 2 ~ when x 2 + y 2 ^0, how must /( 0,0) be defined so as to 

X ~r y 

make / continuous at (0, 0)? 

2. Let us define fix , y) = if x^ 0, and fix , y) = y if x = 0. Does / have any 

points of discontinuity? 

3. If we define fix , y) = xy fix 1 + y 2 ) when x 2 + y 2 ¥> 0, and /(0, 0) = 0, show that / is 
discontinuous at (0, 0), but also that fix , 0) and /(0, y) are continuous functions of x and y, 
respectively, with no exceptions. 

4. Let fix, y) = (x 2 + y 2 ) tan -1 (y/x) if x # 0, and define /(0, 0) = 0, but do not consider 

/ defined if x = 0 and y ^ 0. (a) Is / continuous at (0, 0) according to the definition in the 

text? (b) Is it possible to define / at the one additional point (0, 1) so as to make it continuous 
there? 

5. Let fix, y) = xy log(xy) if xy >0, and define fix, y) = 0 if xy = 0. Where, if at all, 
is / discontinuous? 

b "6. Let fix, y) = (5x + y)/(x - y). Show directly by the definition that / is continuous 
at (4, 1) by proving that, if e >0, |/(x, y)-/(4, 1)| <e provided (x, y) is in a sufficiently 
small neighborhood of (4, 1). Start by showing that 

l/(*. y)-/(4, 1)1 = 2|x — 4| + 8|y - 1| 

at the points of the square 3<x<5,0<y<2. 

7. If fix, y) = e~ ,/|jf_y| when x^ y, how must / be defined when x = y so as to make 
it continuous at all points of the plane? 

8. Let us define fix, y) = 0 if y^0 or if x 2 ^y, and fix, y) = 4y(x 2 - y)/x 4 if 
0 <y <x 2 . 

(a) Is / continuous at (0, 0)? (b) Discuss possible discontinuity at other points on the line 

y = 0 or the curve y = x 2 . 

9. Let fix, y) = x(l -x 2 - y 2 )~ l/2 , the region R of definition being defined by x 2 + 
y 2 <l. It is possible to add the single point (0, 1) to R and define / so as to make it 
continuous at that point? Consider values of / on the circle x 2 +y 2 — y =0, and also at 
other points in R near (0, 1). 

10 . If fix, y, z) = xyz/(x 2 + y 2 + z 2 ) when x 2 +y 2 4-z 2 ^0, is it possible to define 
/(0, 0, 0) so as to make / continuous at the origin? 


5.4 / MODES OF REPRESENTING A FUNCTION 


The standard method of representing a function of one variable is by graphing in 
rectangular co-ordinates. We write y = fix ) and plot the points (x, y). If / is 
continuous on an interval, the graph will be a curve in the plane. 

The corresponding procedure for the case of two independent variables is 
familiar. We write z = fix, y), and plot the points (x, y, z). If f is continuous in a 
region R of the xy -plane we obtain a surface in space (see Fig. 36). 


<\ U. ? ■< t'W f ^ O 








f. 
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Fig. 37. 


When we go to three independent variables there is no satisfactory analogue 
of the foregoing methods of graphical representation, for we cannot draw upon 
any familiar geometric intuition to visualize w = f(x, y, z) as defining a 
configuration in space of four dimensions. There is, however, another mode of 
representation which is helpful. It is available as well in the case of two 
independent variables, and since the figures are easier to draw, we begin with that 
case. 

When / is defined in a region R , we can think of each point of R as being 
given a label, namely, the value /(x , y) at that point. A good example is obtained 
by thinking of the xy -plane as a map on which elevations above sea level are 
marked at various locations, f(x , y) being the elevation in feet at (x, y) (see Fig. 
37). To carry this example further, imagine that the map is a topographic map 
with contour lines drawn in, showing lines of equal elevation. Each line is 
labeled; there is a line for 500 feet above sea level, others for 400, 600, and so 
on. In the aggregate, the configuration of these lines, together with their 
numbering, gives us a good visual representation of the elevation as a function 
of x and y. 

This “topographic map” idea can be carried over to any function /(x, y). 
Instead of contour lines we consider curves along which /(x, y) is constant in 
value. Such a curve is called a level curve of the function. If the constant value is 
C, the equation of the level curve is /(x, y) = C. See Fig 38 and Fig. 39. Isobars 



Fig. 38. Level curves of f(x, y) = x + y 2 . 
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Fig. 39. Level of curves of f(x, y) = x 2 + y 2 - 2. 

(curves of equal atmospheric pressure) on a meteorological chart furnish another 
good example of level curves of a function. 

The three-dimensional analogue of this mode of representation is now easily 
grasped. Instead of level curves we shall have level surfaces /( jc, y, z) = C. 
Common physical examples of functions of three variables which are con- 
veniently visualized in this way are density and temperature in a gas or other 
medium. 


x 





6 / THE ELEMENTS 
OF PARTIAL 
DIFFERENTIA TION 


6 / PARTIAL DERIVATIVES 

In this chapter the main objective is the exposition of the fundamentals of 
differential calculus for functions of two or more variables, with emphasis on 
the formal procedures which are used in dealing with various kinds of particular 
problems. The subject is developed from the beginning. Many students will 
already know some things about partial differentiation from courses in elemen- 
tary calculus. For them the earlier parts of this chapter will serve as a review. By 
concentrating on procedures and techniques in this chapter, and putting most of 
the theoretical considerations in later chapters, the authors have hoped to make 
the treatment of partial differentiation adaptable to the needs of students of 
varying degrees of preparation. For further expression of the authors’ intentions 
in the organization of Chapters 6, 7, and 8, see the introductory sections of 
Chapters 7 and 8. 

Let /(x, y) be defined in a region R of the xy-plane. If we think of y as fixed 
and x as variable, the derivative of /(x, y) with respect to x is called the partial 

derivative with respect to x. This partial derivative is denoted by If we write 


u = f(x, y), the partial derivative is also denoted by — . Likewise, the partial 

derivative with respect to y, — or ^ , is the derivative of f(x, y) with respect to 

ay ay 

y when x is regarded as a constant. 

Example 1. If u— x 2 y + e~ xy \ 


2xy - y V 


|y = x 2 -3xyV Jt), \ 

Similar definitions and notations apply in dealing with functions of three or 
more independent variables. 

The partial derivatives of and are called the second partial derivatives 
of /(x, y). There are in all four second derivatives of /(x, y). The notations for 
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these derivatives, if we write u = /(x, y ), are the following: 


d / du\ _ d 2 U 
dx \dx/ dx 2 ’ 

d / du\ _ d 2 U 
dx \dy ) dx dy * 


d / dll\ _ d 2 U 

dy\dx) dy dx' 

d ( du \ __ d 2 U 
dy \dy)~ dy 2 ' 


Example 2 . For the function of Example 1 we have 


We observe that 


d 2 u 

dx 2 


= 2y + 


^ = 2x + 3xy 5 e^ 3 -3y 2 e^\ 
= 2x + 3xy 5 e _Jcy3 - 3y 2 e~ xy3 , 
= 9x 2 y 4 e^ y3 - 6xye~ xy3 . 


d 2 u __ 

ay ax ax ay 


( 6 - 1 ) 


in this example. We shall ordinarily find that the relation (6-1) holds true for the 
functions we meet in practice, for the relation is valid at a point provided both 
the second derivatives are defined in a neighborhood of the point and continuous 
at the point. This will be proved in §7.2 (Theorem III). 


A partial derivative of /(x, y) is again a function of x, y. To denote the value 
af 

of at the point (x 0 , yo) we may use one of the expressions 
ox 


if) 

d x / <x 0 , y 0 ) 


df 

dx 


(Jo* yo) 


These notations are rather awkward, however; it is desirable to have a standard 
functional notation for partial derivatives. For a function /(x, y) of two in- 
dependent variables we shall write 

/, ( *,y) = Jf f 2 (x,y) = ft- 

For the value of a partial derivative at a point we then have expressions such as 
/,(xo, yo) = (S) (J0 yo) > b) = (CL.m 
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For second derivatives we use the notations 

/ll( *’ y) = £(0 /,2(x ’ y) = ^©‘ 

/2l(x ’ y) = £(0 f22(x ’ y) = £(§)■ 

Observe the ordering of the numerical subscripts in relation to the order of 
carrying out the differentiations; 1 refers to x and 2 refers to y. 

The notation is extended in an obvious way to derivatives of order higher than 
the second, and also to functions of more than two independent variables. Thus 
for example, 

Mx, y) = £- 2 g), 

and g 3 (x, y, z) = f|> g 123 (*,y, 2 ) = £g(Jf)]. 


6-1 / IMPLICIT FUNCTIONS 


We often deal with functions which are defined implicitly as the solution of 
certain equations. In ordinary practice we can find the partial derivatives of such 
a function by the same procedures which we learn in elementary calculus. 


Example 1. Find J^- from the equation 


x 2 v 2 z 2 
2L + 2L + £_ = i 

16 12 9 ’ 


( 6 . 1 - 1 ) 


on the understanding that z is dependent and x, y are independent. 

We have ^ ~ „ 

2x 2 z dz _ n 

16 9 ax _u ’ 


and so 


dz _ 9x 
dx 16 z 


(6.1-2) 


The equation (6.1-1) actually defines two functions of (x, y), corresponding to 
the two choices of sign in 


z = ± 3 



f)” 


(6.1-3) 


By substituting (6.1-3) in (6.1-2) we obtain the partial derivative for each of 
these two functions: 


dz _ _3x A x 2 y 2 \ 1/2 
ax “ + 16 V 16 9/ * 


(6.1-4) 
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The result (6.1-4) could also have been obtained by differentiating (6.1-3) 
directly. 

The procedure can also be applied in the case of functions defined by 
simultaneous equations. 

Example 2. If u and v are defined as functions of x, y by the equations 


u cos v - x = 0 
u sin v - y = 0, 


(6.1-5) 


find the partial derivatives — > - — 

uX oX 

Method I. One method of procedure is to attempt to solve for u, v in terms 
of x , y. If this can be accomplished, we can then calculate the required partial 
derivatives directly. 

From (6. 1-5) we have 

u 2 cos 2 v = x 2 , u 2 sin 2 v = y 2 . 

Now add these equations and use a familiar trigonometric identity. The result is 

u 2 = x 2 + y 2 , or u = ± Vx 2 + y 2 . (6.1-6) 

Next, going back to (6.1-5), we substitute the value just found for u. We find 


cos v = 


r sin v = 


± Vx 2 + y 2 


(6.1-7) 


± Vx 2 +y 2 

the same sign being taken before the radical in both cases. We might also write 

( 6 . 1 - 8 ) 


tan v = — 
x 


in cases x^ 0. We see that there are in general two possible determinations of u 
from (6.1-6); for v there are an infinite number of possible determinations from 
(6.1-7), differing by multiples of 277. The derivatives of u may be found from 
(6.1-6): 


dU _ + X 

dx~ ~ Vx 2 +y l 

In finding the derivatives of v it is easier to work from (6.1-8). We have 

2 

sec 2 = — r ’ s ^c 2 v = 1 + tan 2 v = 1 + ^3* 

dx X 1 X 

dv __ y _ - y , 

dx~ x 2 sec 2 v x 2 +y 2 

This result could have been obtained by expressing v as an inverse tangent and 
then differentiating. It should, however, be noted that v is not necessarily the 
principal value of the inverse tangent of y/x. 


/ 
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Method II. As an alternative to the first method we may proceed directly to 
differentiate the equations (6.1-5). In doing so we must bear in mind that x, y are 
independent variables and that u, v are dependent variables. Since we are 
seeking partial derivatives with respect to x, we think of y as a constant. Then 


du . dv 

cos v - u sin v - — 1 = 0, 

dx dx 


du , du /-v 

sin v — + u cos v — 0 = 0. 

dx dx 


(6.1-9) 


These simultaneous equations are now solved for — and — . The solution may 

dX dX 

be achieved by elimination, or by determinants and Cramer’s rule. We leave it 
for the student to carry out the solution and find 


du 

dx 


= COS V, 


dv _ — sin v 
dx u 


( 6 . 1 - 10 ) 


To reconcile these answers with the answers as found by Method I, observe 
that, by (6.1-5) and (6.1-6), 


x 

COS V — — 


u 


X 

± VFT? 


— sin v _ _y_ __ - y 

u u 2 x 2 +y 2 

One of the things to be noted in connection with problems like that of 
Example 2 is that the solution by Method II can be carried out even if the 
explicit solution for u and v in terms of x , y is impracticable. 

In all the procedures which have been illustrated here it has been taken for 
granted that the given equations do implicitly define certain functions, and that 
these functions do have partial derivatives. The deeper theoretical questions on 
these matters are considered fully in Chapter 8. 


EXERCISES 

1. Find and from Example 2. Use both of the methods given in the text, and 
reconcile your answers. 

2 . Suppose that u and v are defined in terms of x, y by the equations u 2 - v 2 = xy, 

uv = x 2 + y 2 . Find the first partial derivatives of u and v with respect to x and y, 

respectively. Use the implicit function procedure (Method II). See how far you can go 
with Method I in this case. 

3. Assuming that z is defined as a function of x and y by the equation 4 sin 2 x + 

2 cos(y + z) = 2, find dzldx when dzldy when x = ttI6, y = 7t/3, z = 0. 

4 . Find dzldx and dzldy from the equation x 3 + y 3 + z 3 - 3xyz = 0. 
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6.2 / GEOMETRICAL SIGNIFICANCE OF PARTIAL DERIVATIVES 


Just as the ordinary derivative of a function of one variable has its geometric 
realization in the slope of a line which is tangent to a curve, so the partial 
derivatives of a function of two variables have a geometrical significance in 
connection with a plane which is tangent to a surface. Our purpose in this 

section is to show how the partial derivatives - — and when x = a and y = b, 

dx dy 

are related to the plane which is tangent to the surface z = /(x , y) at the point 
(a, b , c). In this section we shall not give a formal definition of the tangent plane. 
We reserve full discussion of this matter to §6.4, because the concept of the 
tangent plane is the geometrical counterpart of the concept of the differential of 
a function of two variables. 

Let S be the surface z = /(x, y), and let (a, b, c) be a point on S. Then 
c = /(a, b). Consider the line through the point (a, b, c) parallel to the z-axis. Let 
us visualize various planes containing this line, each such plane cutting the 
surface S in a curve. One such plane, cutting the y-axis 
perpendicularly at y = b, is shown in Fig. 40. In the 
diagram, the curve of the intersection of S and this 
plane, y = b, is represented as having a tangent line L at 
the point (a, b, c). One can also imagine a plane x = a, 
passing through (a, b, c) and intersecting the x-axis 
perpendicularly at (a, 0,0). More generally, one can 
imagine a plane different from either of these two, but 
so placed that it passes through (a, b, c) and is parallel 
to the z-axis. The student should construct for himself a 
diagram similar to Fig. 40, with a plane through (a, b, c) 
parallel to the z-axis but cutting neither the x-axis nor 
the y-axis at right angles. This exercise in geometrical 
visualization will be helpful for the following discussion. 

Suppose that each plane through (a, b, c) and parallel to the z-axis cuts the 
surface S in a curve which has at the point (a, b, c) a tangent line which is not 
parallel to the z-axis. Suppose further that all of these tangent lines, correspond- 
ing to the different planes of the type described, lie in a single plane. Then this 
single plane must surely be the tangent plane to the surface S at (a, b, c), if indeed 
there is such a tangent plane. Fig. 40 shows four different curves on S, together 
with their tangent lines, all intersecting at ( a , b, c). 

Assuming now that there is a tangent plane to S at (a, b, c) not parallel to the 
z-axis, let us see how to find its equation. If cos a, cos /3, cos y are the direction 
cosines of a line which is normal (perpendicular) to this plane, the equation of 
the plane can be written 

(cos a)(x - a) + (cos /3)(y - b) + (cos y)(z - c) = 0. 

Since the plane is not parallel to the z-axis, we know that cosy^O; we can 
therefore solve for z - c by dividing by cos y, thus obtaining an equation of the 
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form 


z-c = A(x-a) + B(y- b). (6.2-1) 

Our problem is to find A and B. If we put y = b in this equation, we find the 
relation between x and z on the line in which the tangent plane (6.2-1) cuts the 
plane y = b. This line of intersection must be the line L which is tangent to the 
curve in which the surface S is cut by the plane y = b (see Fig. 40). The relation 
between jc and z on L is obtained by putting y = b in (6.2-1); that is, 

z - c = A{x - a). 

The relation between x and z along the curve in which S is cut by the plane 
y = b is 

z=f(x , b). 

Hence, according to the usual relationship between slopes and derivatives, A 
must be the derivative of z with respect to jc when y is held constantly equal to 
b: 

= /,(a,h). (6.2-2) 

In the same way, putting x = a in (6.2-1) and interpreting B as a slope in the 
relation between y and z, with x held constantly equal to a, we see that 

Consequently, we see that the equation of the plane tangent to S at (a, b, c) can 
be written in the form 

z - c = /i(a, b)(x - a) + / 2 (a, b)(y - b), (6.2-4) 

or 

z = /(a, b) + fi(a, b)(x - a) + f 2 (a , b)(y - b). (6.2-5) 

The foregoing discussion of the tangent plane is not complete because of the 
fact that no full and formal definition of the tangent plane has been given. 
However, the discussion here is wholly consistent with the definition to be given 
in §6.4. It is shown there that when S has a tangent plane at (a, b , c), not parallel 
to the z-axis, then / has partial derivatives when x = a and y = b. The arguments 
we have just given can then be used to derive the equation (6.2-4) for the 
tangent plane. Observe that the assumption that / has partial derivatives at 
jc = a, y = b rules out the possibility that the tangent plane might be parallel to 
the z-axis. 

The line through (a, h, c) perpendicular to the plane (6.2-5) is called the 
normal to the surface S at that point. As we see from the equation (6.2-5), the 
direction of the normal is determined by the ratios 

fi(a,b):f 2 (a,b):- 1, 


= fi(a, b). (6.2-3) 

y =b 


A=^-f(x,b) 
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for the direction cosines of the normal to a plane are proportional to the 
coefficients of x, y, and z respectively, in the equation of the plane. Here is a 
result to be remembered: 


The line normal to the surface z = f(x, y) at a given point has direction ratios 
dz dz 

— : — the partial derivatives being evaluated at the point in question. 


Example L (a) Find the equation of the plane tangent to the paraboloid 
48 z = 2x 2 + 3y 2 at the point (3, 2,|). (b) Find the direction cosines of the normal 
to the surface at the point. 


(a) We have 


48 ft- 4 *- 48 S* 6 »'. 


at the point in question, therefore, J~ = = and the equation of the tangent 

plane is 

z - £ = K* - 3) + l(y - 2), or 2x + 2y - 8z = 5. 

(b) To obtain the direction cosines from the ratios iij: — 1, we first compute 

3V2 


[(-j) 2 +(i) j +(-D 2 ] i,2 = 


4 
1 1 


3V2 


The direction cosines are found by dividing i i -1 by ■ — They are, 
accordingly,^.^ ^ 


Example 2. Show that at every point of intersection of the two surfaces 
z = 2(x 2 +y 2 ), 8z = 17- (x 2 + y 2 >, the normals to the two surfaces are per- 
pendicular. (Because of this we say that the surfaces intersect orthogonally.) 

The student will readily find that the surfaces are paraboloids of revolution 
intersecting each other all along the circle x 2 + y 2 = 1 in the plane z = 2. There 
are no other intersections. Now, at a point of the first paraboloid the direction 
ratios of the normal to the surface are 


4x : 4y : -1; 

for the second paraboloid the direction ratios of the normal are found to be 



The condition for perpendicularity of these two normals at a point which is 
common to the two surfaces is therefore 

4 «(t) + 4 K^) + '“»- 

-■■/ r J p rff'" *>. .» . n r 

t " . " " v * ' ) Km • 

Qx \ 'oO f Tf \ q C'W ‘ y w A 
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or -(x 2 + y 2 ) + 1 = 0. Since this equation is satisfied along the intersection of the 
surfaces, the demonstration of perpendicularity is complete. 

EXERCISES 

1. Find the equation of the tangent plane to the surface z = e~ x sin y 

(a) at x = 0, y = tt/2, (b) at x = 0, y = tt, (c) at x = 0, y = 0. (d) Make as good a 

diagram as you can of the surface for 0 ^ y ^ tt , x > 0. 

2. Find the equation of the plane tangent to the surface x 3 + 2xy 2 - 7z 3 + 3y + 1 = 0 at 

(1,1,1). 

3. Prove that the plane tangent to the surface z — x 2 — y 2 at the point (a, b, c) is 
pierced by the z-axis at the point for which z = - c. 

4. Find the points of the praraboloid z = x 2 +y 2 — 1 at which the normal to the 
surface coincides with the line joining the origin to the point. What is the acute angle 
between the normal and the z-axis at these points? 

5. If a 2 b 2 , prove that no normal to the surface z = (x 2 /a 2 ) + (y 2 lb 2 )- c, at a point 
for which x# 0 and y ^ 0, can pass through the origin. 

6. Prove that the spheres x 2 +y 2 +z 2 = 16, x 2 + (y - 5) 2 + z 2 = 9 intersect orthogon- 
ally, using the method of Example 2. 

6.3 / MAXIMA AND MINIMA 

We sometimes have occasion to inquire about the largest or smallest value 
attained by a function under specified circumstances. In speaking about maxi- 
mum (or minimum) values it is very important to distinguish between a relative 
maximum and an absolute maximum. Suppose we are dealing with a function 
f(x , y) defined in a region R of the xy-plane. 

Definition . We say that the function f has a relative maximum at the point (a, b ) 
if there is some neighborhood of (a, b) such that /(x, y)^/(a, b) for all points 
(x, y) of R which are in this neighborhood . We may express the definition 
otherwise by saying that the value of f at (a, b) is at least as big as at any of the 
points (x, y) around (a, b) and not too far away. 

Thus for instance, within a given range of mountains, the elevation of the land 
surface above sea level attains a relative maximum at the summit of any particular 
peak in the range. 

Definition . Let f be defined in a region R, and let S be any part of R ( i.e ., any 
point-set in R). In particular , S might be all of R. Suppose there is in S a point 
(a, b) such that /(x, y) ^/(a, b) for all points (x, y) in S. We then say that on the 
set S the function f has an absolute maximum at (a, b). 

Observe that, on a given set S, / can have a relative maximum which is not 
an absolute maximum. Observe also that a function may fail to have an absolute 
maximum on a given set (think of the function l/(xy) in the first quadrant). 
Similar definitions are made for relative and absolute minima of a function. 
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In problems where we have to find the absolute maximum of a function on a 
given set we usually find that it is convenient to begin by looking for relative 
maxima. If there are only a few of the latter we may be able easily to select one 
which furnishes an absolute maximum. Hence it is useful to have criteria for 
locating relative extrema. 

THEOREM I. Let f be defined on a region R , and let the function have a relative 
extreme ( maximum or minimum ) at the point (a, b ) of R . Suppose further 
that (a, b) is an interior point of R ( not on the boundary ), and that f has first 
partial derivatives at (a, b ). Then these derivatives are zero at that point: 

f\(a, b) = 0, f 2 (a, b) = 0. H&3V1) 

Proof. This theorem should be compared with Theorem III of §1.12. The 
proof is based on this earlier theorem. Consider f(x, b); this is a function of the 
single variable x, its values being those of the function /(x , y) along the line 
y = b. As a function of x, /(x, b) has a relative extreme at x = a. Moreover, the 
derivative of f(x , b) at x = a is fi(a , b). Therefore, by Theorem III, §1.12, we 
conclude that fi(a, b) = 0. In the same way, applying this earlier theorem to the 
function /(a, y) of the single variable y, we conclude that / 2 (a, b) = 0. 

The hypothesis that (a, b) is an interior point of R is essential. A relative 
extreme can occur at a boundary point of R, and in that case equations (6.3-1) 
may not hold. 

It is important to realize that, under the conditions stated in Theorem I, the 
vanishing of the first partial derivatives is a necessary, but not sufficient, 
condition for a relative extreme. If the surface z = /(x, y) has at the point x = a, 
y = b a tangent plane which is parallel to the xy-plane, then equations (6.3-1) 
hold; but z need not be a relative extreme at such a point. A “saddle-point” of a 
surface is an illustration of such a situation. 

The foregoing definitions and Theorem I extend to functions of three or 
more variables in an obvious manner. 

As in elementary calculus, sufficient conditions for a relative maximum or 
minimum can be formulated by adding to (6.3-1) certain conditions on the 
second derivatives of / at the point (a, b). We discuss such conditions in §7.6. 
For the present, however, we proceed to illustrate some uses of Theorem I. 

Example 1. Find the point of the plane 2x - 3y - 4z = 25 which is nearest to 
the point (3, 2, 1). \ 

If D is the distance from the point (x, y, z) of the plane to (3, 2, 1), we have 
D 2 = (x — 3) 2 + (y - 2) 2 + (z - l) 2 and z = \(2x - 3y - 25). Hence, eliminating z, 

D 2 = (x - 3) 2 + (y — 2) 2 + ( 2 X - iy - ?) 2 . 

We seek the minimum value of D 2 as x, y range through all possible values. In 
this case all points are interior points of the region (namely the whole xy -plane), 
and D 2 has partial derivatives at all points. We therefore look for points at which 
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d(P 2 ) _ d(D 2 ) 
dx dy 


= 0. The equations to be considered are 


2(x - 3) + 2(jx - !y - ?) • 5 = 0, 

2(y-2) + 2({x-h~ J i)-(T) = 0. 


On simplifying, we obtain 


10x-3y = 53, 

- 6x + 25y = - 55. 

The solution is found to be jc = 5, y = — 1. Substituting in the equation of the 
plane, we find z = - 3. We now argue as follows: The function D 2 certainly has 
an absolute minimum (from the geometrical nature of the problem). This 
absolute minimum is also a relative minimum, and the conditions of Theorem I 
apply. But we obtain a unique point at which the two first partial derivatives 
vanish. Hence, this point must furnish the desired absolute minimum. 

Example 2. Locate the points which might furnish relative maxima and 
minima of the function 

/(*> y) — 2xy - (1 - x 2 - y 2 ) 3/2 

in the closed region x 2 + y 2 ^= 1 (which is the region of definition of the function). 
Hence, find the absolute maximum and minimum values of the function. 

We first apply the criterion of Theorem I. We have 

~~ = 2y + 3x(l - x 2 - y 2 ) m , 

oX 


~=2x + 3y(l — x 2 — y 2 ) 112 . 


The interior points of the region are those for which x 2 + y 2 < 1. The interior 
points which might furnish a relative maximum or minimum are among those 
which we find by solving the equations 


3x(l - x 2 - y 2 ) 1/2 = - 2y, 
3y(l - x 2 — y 2 ) 1/2 = —2x. 


* (6.3-2) 


An obvious solution of these equations is x = 0, y = 0. If neither x nor y is zero 
we may divide one equation by the other and obtain the result 


- = -> or x = y . 
y x 

Hence, substituting back in the first equation of (6.3-2) after squaring both sides, 
we obtain 


9x 2 (l - 2 jc 2 ) = 4x 2 , or 9 - 18x 2 = 4. 

Thus we find x 2 = y 2 = Going back again to (6.3-2) we have (1 - x 2 - y 2 ) 1/2 = y 


& 
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and hence 3x(J) = -2y, or x = - y. Note that x = y is ruled out. There are 
therefore three points in all which satisfy (6.3-2). They are 

Po = (0, 0), P, = (Wl - VI), P 2 = (-VI’ VI). 

The accompanying table of values may now be constructed: 

Point Value of/ 

Po ~l 

P, and P 2 -23/27 

We emphasize that Theorem I does not assert that the function has relative 
extrema at all three of these points; it only states that if any relative extrema 
occur at interior points, such extrema are found among these three points. 
Before drawing any conclusions about absolute extrema we must investigate the 
behavior of the function on the boundary of the region. On the boundary we 
have x 2 +y 2 =\ and therefore /(x, y) = 2xy. To lo ok for extreme values of / on 
the bo undary we might solve for y: y = ± V 1 - x 2 , and look for the extremes of 
± 2x V 1 — x 1 by the methods of elementary calculus. We find such extremes when 
x 2 = 2 - A more elegant procedure is to introduce the parametric equations x = cos 0, 
y = sin 0 for the boundary circle (here 0 is the usual angle of polar co-ordinates). 
Then 2xy = 2 cos 0 sin 0 = sin 20, and we see that the values range between 

-l(at e=^for^f) and +l(at 0 = jor^). 

We have now found four more points which must be considered along with the 
original three when we look for the absolute minimum and maximum values of 
the function in the closed region. When we compare the values ±1 with the 
values at the points listed in the table, we see that the function has the absolute 
maximum value +1, and the absolute^ minimum value -1. The maximum occurs 
at the two boundary points (V2/2, V2/2), (-V2/2, — V2/2). The mini_mum occurs 
at the interior point P 0 and at the two boundary points (V2/2, — V2/2, 
(— V2/2, V2/2). Our work has not settled the questions as to whether the interior 
points P u P 2 are points of relative extrema or saddle points. They are in fact saddle 
points, as may be shown by an examination of the function in polar co-ordinates. 

Example 3. A shelter for use at the beach is to be built in the form of a 
box-like space with canvas covering on the top, back, and ends. If 96 square feet 
of canvas are available, what should be the dimensions of the shelter to give it 
maximum cubic content? 

Let the shelter be y feet between ends, x feet from front to back, and z feet 
high. Its volume is V = xyz. The area to be covered by canvas is 

A = 2xz + xy + yz. 

Since A = 96, we can use this last equation to eliminate one variable, say y: 
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Here V is expressed in terms of the independent variables x, 2 ; we can set 
dV dV 

— = — = 0 and solve for x and 2 . An alternative procedure which is in some 
ox oz 

ways preferable is the following: Differentiate both of the equations 

V = xyz , A - 2x2 + xy + y 2 

with respect to x and 2 , regarding y as a function of x and 2 . Differentiation with 
respect to x gives 


°=f = y 2 +x 2 S’ 0 = 22 +x S +y+ 2 S' 


dy 

We now eliminate between these two equations 

oX 


dy 

dx 


0 = 2z + x(-j) + y + z(- 



2z - y + y - y = 0, 2x = y. 

Since x and 2 enter symmetrically, we infer that 2z = y also, and hence that 
x = 2 . To get the values of x, y, 2 we return to the formula for A. We now have 

96 = 2x(x) + x(2x) + (2x)x = 6x 2 . 

Hence x 2 = 16, x = 2 = 4, y = 8. The volume of the shelter of these dimensions is 
128 cubic feet. 


One logical issue still remains to be settled in the foregoing “solution” of the 
problem posed in Example 3. How do we know that we really found the 
dimensions which yield maximum volume? Our method was based on two 
assumptions: First, that there is a shelter of maximum volume under the given 
conditions, and second, that when V is expressed as a function of the in- 
dy dV 

dependent variables x, 2 , the equations -r— = -r- = 0 are satisfied when V attains 

oX oZ 

its maximum. If we can justify these two assumptions, our solution will be fully 
established. Let us then consider V as a function of x and 2 . The formula is 


V = 2x2 


48 - X2 

X + 2 ’ 


(6.3-3) 


we do not consider all values of x and 2 , however, but only 
those which have a meaning for the problem under consid- 
eration. Thus we must have x ^ 0, z ^ 0. We must have xz ^ 
48 also, since a negative volume would have no meaning. 
The region R in which we consider the values of V is thus 
composed of all points of the x 2 -plane for which x ^ 0, 2 ^ 
0, xz ^ 48, except the one point x = 0, 2 = 0. This point is 
ruled out, since V is not defined there. The region R is 
shown in Fig. 41. 


z 
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Let us now establish the fact that among all possible values of V in the 
region R, there is an absolute maximum value, and that this maximum occurs at 
an interior point of R. Observe that V is positive in the interior of R and that 
V = 0 at all points of the boundary of R except the origin (where V is not 
defined). Now, from the fact that xz ^ 48 in R we see that 

V § 96 — x - ’ • 

X + Z 4 


and hence V-^0 as a variable point (x, z) of R moves in such a way that either 
x->oo or z - > °°. Finally, V-*0 as (x, z)-> (0, 0). For, 2xz^=x 2 +z 2 and x + z ^ 
VFT? are true inequalities (the latter when x and z are non-negative), and 
therefore 




96(x 2 + z 2 ) 
Vx*+ z 2 


= 96 V?+?, 


so that V — > 0 as (x, z)->(0, 0). From the foregoing arguments it is now clear that 
if we form a new region R 0 by excluding from R the points for which x 2 + z 2 < S 2 
and x 2 + z 2 > 1/S 2 , where S is sufficiently small, the values of V at these excluded 
points will all be smaller than some of the values of V in the remaining region 
R 0 . Since R 0 is a bounded and closed region in which V is continuous, V must 
have a maximum value in R 0 (Theorem II, §5.3). The maximum value of V in R 0 
will also be a maximum value of V in relation to all points of the larger region R. 
Since this maximum is positive, it must occur at an interior point of R . 

dV dV 

We now apply Theorem I to draw the conclusion that — = — = 0 at the 

OX OZ 

point where V is a maximum. Since there turned out to be only one point in R, 
namely x = z = 4, at which these conditions are satisfied, this point must be the 
point where V is a maximum. 


EXERCISES 

1. Find the point of the plane x + 4y + 4z = 39 nearest the point (2, 0, 1). 

2. Find the greatest value of the function xy(c-x-y) in the closed triangular 
region with vertices (0,0), (c, 0), and (0, c). Assume c >0. 

«/ 3. Find the absolute maximum of 144x 3 y 2 (l — x - y) in the first quadrant of the 
xy-plane. 

4. Does /(x, y) = x 2 + fxy + y 2 + (576/x) + (576/y) have any absolute extrema in the 
region x > 0, y > 0? If so, find where such extrema occur, and the type (maximum or 
minimum). Give all the supporting details of your argument. 

5. Find the absolute minimum value of 

,, x 2 . 2 . (2 A- ax - by\ 2 

f(x, y) = x + y + ^ ) 

where A, a, b, and c are positive constants. All values of x and y are admitted. How do 
you know that a minimum exists? 

. 6. Find the absolute extreme values of the function /(x, y) = 2xy + (1 — x 2 - y 2 ) 1/2 in 
the region x 2 + y 2 ^ 1. 
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7. Find the absolute extreme values of the function f(x , y) = xy - (1 - x 2 -y 2 f 2 in 
the region x 2 + y 2 ^k 1. 

8. (a) Introducing polar co-ordinates, show that the function of Example 2 becomes 

/(x, y) = F(r, 6) = r 2 sin 20 - (1 - r 2 ) 3 ' 2 . (b) Find the extreme values of F for - 1 ^ r ^ 1, 

0 unrestricted, considering F as defined in a region of the rO plane, (c) At the points of 

dF d F 

the r0-plane which correspond to the points Pi, P 2 of Example 2, show — = 0, — = 0, 

dr du 

d 2 F d 2 F 

— r < 0, ~tt 2 > 0- From these facts explain why these points are saddle points and not points 
dr dtf 

of relative extrema. 

9. Solve Exercises 6 and 7 by the introduction of polar co-ordinates as independent 
variables. 

10. Find the greatest value of the function sin x sin y sin(x + y) in the closed trian- 
gular region with vertices (0, 0), ( 7 t, 0), (0, tt). 

11. Find the absolute maximum value of the function (x 2 + 2y 2 )e -( * 2+y2) , considering 
all possible values of x and y. 

12. Find the absolute extrema of the function 3x 2 - 8xy -4y 2 + 2x + 16y in the square 
0^x^2,0gyg2. 

13. Find the absolute extrema of the function x 3 + y 3 + 3xy 2 - 15x - 15y in the square 
0^x^3, 0^y^3. 

^ 14. Find the minimum value of the function (12/x) + (18/y) + xy in the first quadrant. 

How do you know there is a minimum? 

15. Find the maximum value of the function (xy - 4y — 8x)/x 2 y 2 in the first quadrant. 
How do you know there is a maximum? 

16. A rectangular box without a top has length x, width y, and depth z. The combined 
area of the sides and bottom is fixed as S square feet. Express the volume V of the box as a 
function of x, y, and show that V is greatest when x = y = (S/3) 1/2 , z = x/2. Justify your 
solution completely. 

17. Consider the function Vx 2 + y 2 + V(x - 1) 2 + y 2 . (a) Explain carefully why this 

function must have an absolute minimum at some point of the plane, (b) What is the 
minimum and where does it occur? (c) Are all the minimum points found by setting the 
partial derivatives equal to zero? 

v/ 18. Consider the function /(x, y) = |y| + Vx 2 + (y - l) 2 . 

(a) At what points do one or both of the first partial derivatives of / fail to exist? (b) Find all 
points where they both exist and are equal to zero, (c) What is the absolute minimum value 
of /, and at what points does it occur? 

19. For what position of the point (x, y) is the sum of the distance from (x, y) to the 
x-axis and twice the distance from (x, y) to the point (0, 1) a minimum? 

./ 20. Let /(x, y, z) be the sum of the three distances: from (x, y, z) to the y-axis, from 
(x, y, z) to the z-axis, and from (x, y, z) to the point (1,0, 0). Find the absolute minimum value 
of /, and where it occurs. 


6.4 / DIFFERENTIALS 

Our purpose in this section is to define what is meant by saying that a 
real-valued function of several real variables is differentiable and to define the 
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differential of such a function. These definitions are needed in order that we may 
derive the chain rules for differentiating composite functions (Theorem III, §6.5 and 
Theorem V, §7.3). We stress the case of two variables, but the ideas apply to 
functions of three or more variables, as we shall see. 

in § 1.3 we defined the differential of a function f of the independent variable x 
as the function of x and an independent variable dx whose value is /'(*) dx. The 
differential of / is defined at each point x where f has a derivative. The independent 
variable dx can have any value. For a fixed value of jc the value of the differential is 
a multiple of dx, the multiplier being the value f'(x) of the derivative of / at x. It is 
useful to restate the definition of differentiability as follows: the function / is called 
differentiable at jc if it is defined at x and all points near jc and if there exists a 
number C such that 


Urn l /( * + A*) -/(*)- c M = o. 

M 


Ax-0 


We can rewrite (6.4-1) in the equivalent form 


lim 

Ax— 0 


| /(x + Ax)~/(x) 


Ax 


-C =0, 


and this new form is in turn equivalent to the assertion 


lim 

x-0 


f(x 4 - Ax) — /(x) 
Ax 


= C. 





But (6.4-2) is the same as the assertion that / has a derivative at x, with value 
f'W=C. 

Let the functional symbol for the differential be df , and denote by d/(x; dx) the 
value of df as a function of x and dx. We use a semicolon rather than a comma to 
separate x and dx in order to emphasize the fact that dependence of df on dx is in 
general different from its dependence on x ; df is a linear function of dx. We can 
replace dx by other symbols, such as Ax or h. We can write (6.4-1) in the 
form 


lim |/(* + h)-/(x)-d/(x;h)| = Q 

\h\ 



This is because CAx = /'(x)Ax = d/(x; Ax); we simply replace Ax by h. The 
characteristic features of the differential are: (1) that d/(x; h) is (for fixed x) a 
multiple of h and (2) that the limit relation (6.4-3) is valid. This limit relation can be 
described by saying that d/(x; h) is a good approximation to /(x + h) — /(jc) in the 
sense that the difference 


f(x + h) — f(x) — d/(x; h) 

is small in comparison with |h| as |h|->0. 

The foregoing discussion of the one-variable case provides us with a 
motivation for the definitions of differentiability and the differential for the 
case of a function of two variables. For a function / of x and y we want the 
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differential df(x , y; dx, dy) to be a linear combination Adx + B dy, where A and 
B are numbers determined by f and (x, y), and we want a limit relation 
analogous to (6.4-3). In speaking of points near (x, y) we represent them by 
(x + h, y + k), where the distance from (x + h, y + k) to (x, y) is V h 2 + k 2 . 

Definition. The function f of (x, y) is called differentiable at (x, y) if it is 
defined for all points near (x, y) [that is , in a neighborhood of (x, y)] and if there 
exists numbers A, B [depending on f and (x, y)] such that 

. |/(x + h, y + k) - f(x, y) - (Ah + BJc)| 

(mho,o) VFTF 

The differential of f at (x, y) is then defined to be the function df of (x, y) and 
( dx , dy) with the value 

df(x , y ; dx, dy) = A dx + B dy. ipfllif 



Observe that the differential is a linear function of dx and dy, ( that is, a linear 
combination of them. The variables dx and dy can be assigned any values 
whatsoever. If we set z = f(x, y), the value of the differential is often denoted by 
dz as given by the formula 

dz=^-dx+^-dy. IP®# 

dx dy 

As an immediate consequence of the definitions we can see that when / is 

df df 

differentiable at (x, y) the partial derivatives and exist at (x, y) and are 
given by the formulas 

| t = /,(x,y) = A. ft = / 2 (x, y) = B. WMP? 


Take k = 0, h^ 0 in (6.4-4) and let h -> 0. The result is 


lim 

h^O 


f(x + h, y) -f(x, y) — Ah 
h 


= 0 , 


which is equivalent to 

i im / i* + ^>r / (* ? L> = A . 

h->o n 

Therefore, by definition, A is the partial derivative of / at (x, y). The result for B 
is obtained in the same way. It follows that when / is differentiable at (x, y) the 
pair of numbers A, B satisfying (6.4-4) is unique. 

It is important to observe that the requirement on f of being differentiable at 
a point is stronger than the requirement that / have partial derivatives with 
respect to x and y at the point. This fact is illustrated by the functions in 
# Exercises 2 and 7. In each case there the function is not differentiable at (0,0) 
and yet has first partial derivatives with respect to x and y, respectively, there. 
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It will be proved later that if the first partial derivatives of / exist throughout 
a neighborhood of a point and are continuous at that point, then f is differenti- 
able at the point. (See Theorem II, §7.1.) This simple criterion assures us that 
most of the functions we ordinarily encounter are differentiable at most points. 

The following theorem is useful (in the proof of Theorem V, §7.3, for 
example). 


THEOREM II. If a function is differentiable at a point , it is continuous there. 


Proof. Let us define a function u of (h, k) by the formula 

u(ft f;) fjx+Jh y + fc)-f(x, y)-(Ah + Bfc) 

VFTP 



when (h, k) 7* (0, 0). Then we see from (6.4-4) that the limit of u(h,k) as 
(h, k)->(0, 0) is 0. From the definition of u it follows that 

f(x + h, y + k)-/(x, y) = Ah + Bk + u(h , k)Vh 2 + k 2 

and from this we see at once that 


lim [/(x + /x, y + k)-/(x, y)] = 0. 

(h, k)->(0, 0) 

But by §5.3 this means that f is continuous at (x, y). 

It follows from the theorem that if f is not continuous at (x, y) it cannot be 
differentiable there. 

A function can be continuous without being differentiable, as the following 
example shows. 

Example 1. Let f(x , y) = Vx T T~y 1 . This function is continuous at all points, 
(0, 0) included. But it is not differentiable at (0, 0). In fact, it does not even have 
first partial derivatives at (0, 0). To verify this consider the ratio (for h^ 0) 

/(h,0)-/(0,0) VP-0 |h| 

h h h 


This ratio is 1 if h > 0 and - 1 if h < 0. Hence it has no limit as h -> 0, and the partial 

rtf ar 

derivative does not exist at (0, 0). The argument is the same as regards — , by 
dx o y 


symmetry. 

We shall presently give an example to illustrate the property of the differential 
expressed in (6.4-^f). For the details of the example and other problems it is useful to 
know the following inequalities: 

2\ab\Sa 2 +b 2 ; 

Va 2 + h 5 S | a | + |fr| s VlVa 2 + b'. |g^Sl®P 
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The proof of (6.4-9) is simple: 

0 <(|a| - |b|) 2 = |a| 2 — 2|a| |b | + |b| 2 = a 2 -2\ab\ + b 2 . 

On transposing 2|ab| we obtain (6.4-9). The first part of (6.4-10) follows from the 
obvious fact that 


a 2 + b 2 ^ a 2 + 2|a||b] + b 2 = (|a| + |b|) 2 . 

The second part of (6.4-10) is obtained with the aid of (6.4-9): 

(|a| + |b|) 2 = a 2 + 2|ab| + b 2 ^ a 2 + (a 2 + b 2 ) + b 2 = 2(a 2 + b 2 ). 

Now extract square roots at each of the ends of the foregoing inequality. 

Example 2. Let f(x, y ) = 3x 2 y + 2xy 2 + 1. Find the differential at x = 1, y = 2 
after verifying that the limit relation (6.4-4) holds for this case. 

We use (6.4-7) to find A and B. We have 

f£ = 6xy + 2y 2 , ^ = 3x 2 + 4xy. 

Evaluating at (1, 2), we find /( 1, 2) = 15, /i(l, 2) = 20, f 2 ( 1, 2) = 11, so we form the 
expression 

/(l + k,2 + k)-/(l,2)-(20h + llk) 

= 3(1 + h)\ 2 + k) 4- 2(1 + h)(2+ k) 2 + 1 - 15 - (20h + 1 lk). 
Upon expansion and simplification, we find that the expression reduces to 
2(3 h 2 + 7 hk + k 2 ) + hk(3h + 2k). 

Thus, for this case, the expression on the left side of (6.4—4) becomes 

|2(3h 2 + 7 hk + k 2 ) + hk(3h + 2k)| 

Urn J . 

(Mmo,o) vh +k 

Now, clearly, 3h 2 + k 2 ^ 3{h 2 -\- k 2 ). Also, by (6.4-9), 2|fik| ^ h 2 + k 2 . Therefore, the 
fraction whose limit we are considering is not larger than 

(h 2 +k 2 )(13 + ll3h+2kl) 

VVTP 


which equals 

vV+k 2 (13 + j|3fi +2k|). 

This clearly approaches 0 as (h, k)->(0, 0), so we are through with the 
verification. The differential at (1, 2) is 

df(l, 2; h, k) = 20h + 1 lk. 

We saw in Fig. 12 of §1.3 that for the function /, when we consider the 
differential of / at (x, y), the relationship between dx and dy is this: as dx varies, 
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the point (x + dx, y + dy) moves along the line tangent to the graph at (x, y ). 
There is a similar relationship in the case of a function of two variables. Let us 
assume that / is continuous, and consider the surface S defined by z = /(x, y). 
To say that / is differentiable at a particular point (x 0 , yo) can be interpreted 
geometrically. The implication of differentiability is that at the point (x 0 , yo, z 0 ), 
where z 0 = /(x 0 , yo), the surface S has a tangent plane not parallel to the z-axis. 
The equation of this plane is 

z - z 0 = A(x - x 0 ) + B(y - y 0 ) 


where A and B are the values of ~ and respectively, at (x 0 , y 0 ). To show 

that this plane is indeed the tangent plane, we need a definition of what is meant 
by saying that a plane is tangent to a surface at a certain point. Let P 0 be the 
point (x 0 , yo, Zo) and let M be a plane containing P 0 . We define M to be the 
tangent plane to S at P 0 if, when P is a point of S different from P 0 , the angle 
between the line P 0 P and the plane M approaches 0 as P approaches P 0 . We 
shall show that this condition is fulfilled by the plane (6.4-11) as a consequence 
of the differentiability of / at (x 0 , yo). 

For this purpose we anticipate a result from Chapter 10 that may already be 
familiar to the reader — that the cosine of the angle 0 between two vectors is 
equal to the dot product of the two vectors divided by the product of their 
lengths. See (10.2-3). We take 6 to be the nonobtuse angle between the line PoP 
and the normal to the plane (6.4-11) at P 0 . Because 0 is the complement of the 


angle between P 0 P and the plane, we have to show that 0-*y> or e Q u i va l ent ly> 

that cos 0-^0. Now, let P be the point on S determined by x = x 0 + h, y = y 0 + k, 
where h and k are small and not both zero. The vector P 0 P has components h, k, 
A z, where 


Az = f(x o + h, y 0 + k) - f(x o, y 0 ). J6.4-12) 


A unit vector normal to the plane (6.4-11) has components A/d, Bid , -1/d, where 

d = ±(A 2 + B 2 + 1 ) 1/2 , 


and the sign of d is chosen so that the angle between P 0 P and the normal is 
nonobtuse. Then (using formula (10.21-3) for the dot product), 


|Ah + Bk - Az| 

cos 6 ~ M 2 +B 1 + T) r/: [h- + fc 2 +(A2) 2 ] ,/2 ' 




Because A and B are constants we see from (6.4-13) that we have to prove that 


.. |Ah + Bk - Az| _ n 
[ h 2 + k 2 + (Az) 2 ] 1 2 


Now, considering (6.4-12), we see that the fraction in (6.4-14) is not larger than 


|/(x 0 + h, y o + k) - /(x o, y 0 ) - (Ah + Bk)[ 


(h z + k z ) 


2\\JT 
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But the limit of the expression in (6.4—15) is 0 as a consequence of the 
differentiability of / at (x 0 , yo). Therefore (6.4-14) must be true. 

It can be shown, conversely, that if (6.4-14) is true, then the limit of the 
expression in (6.4-15) is 0, and hence / is differentiable at (x 0 , yo) if and only if 
the plane (6.4-11) is tangent to S at P 0 . We forego the details of the proof of 
this converse. 

Now that we know the relationship between differenti- 
ability and the tangent plane, it is easy to see how a func- 
tion / can have both partial derivatives /i(a, b), / 2 (a, b) ex- 
isting and yet fail to be differentiable at (a, b). All that is 
necessary is to have a surface z = /(x, y) such that it has no 
tangent plane at x = a, y = b, and yet such that the surface 
is cut by the two planes x = a, y = b in curves whic h have 
tangents at the point in question. The cone z = Vx 2 + y 2 
does not have a tangent plane at its vertex (the origin). The 
paraboloid z = x 2 + y 2 has the tangent plane z = 0 at the 
origin. We can readily imagine a surface which coincides 
with the paraboloid where the latter intersects the planes 
x = 0, y = 0, and which coincides with the cone where 
the cone intersects the planes y = ±x. Such a surface 
z = /(x, y ) will have /i(0, 0) = 0, / 2 (0, 0) = 0, but it will not have a tangent plane at the 
origin (see Fig. 42). One such surface is defined by the equation 

z = (|x| - |y |) 2 + (with z = 0 when x = y = 0). 

Vx + y 


z 



For functions of more than two variables the definitions of differentiability 
and differentials follow the model of the two-variable case. For a function / of 
the n variables x,, . . . , x„ we want the differential of / at (xj, . . . , x„) to have as 
its value a linear combination 

d/(xi, . . . , x n ; hi, . . . , h„) = A [hi + • * • 4- A n h n 

of the independent variables h 1# . . . , h„, and we want this linear combination to 
be a suitable approximation to the difference 

f(xi + hi, . . . , x n + h n ) - f(x i, . . . , x„) 


when all the h,’s are sufficiently small. To express this precisely let us for 
convenience write 

||h|| = (h i + • • • + h 2 ) 1/2 . 

Then ||h||— >0 means that each of the h f ’s approaches 0. We now define / to be 
differentiable at (xj, . . . , x„) if there exist n numbers A u , . . . , A„, depending on f 
and on the Xf’s, such that 


lim 

llfclho 


|/(xi + hu . . . , x„ + h n ) /(x ( , . . . , x„) - (A^i + • • • + AA)1 


= 0. 


When this condition is satisfied we define the differential of / by (6.4-16). 
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Just as in the two-variable case, we see that, when / is differentiable at 
(X|, . . . , x„), / has first partial derivatives given by 


iU A - 

dXi A ■’ 


i 1, 2, . . . , it. 




If we use a dependent variable u to denote the value of /, then du is defined 
as a function of x h . . . , x n and dx i, . . . , dx n by the formula 

du ^£ ]dXl + ... + £ : dx „. 6mmi 

As a matter of technique in using differentials, it is important to know the 
standard formulas 

d(u + t>) = du + dv, 
d(uv ) = udv + v du , 
v du — udv 


(?)- 


Here u and t; may represent differentiable functions of several independent 
variables. Suppose, for example, that 


Then, by definition, 


But 


« = /(x, y), v = g(x, y). 


, du , dw , 

dH = to dx + a7 dy ’ 

, , . dv , 


d(uv) dv du 


with a similar formula for -^ u -\ With these relations before him the student can 

dy 

readily write out the work necessary to verify the relation d(uv) = u dv + v du. 
We likewise have such formulas as 

de u = e u du, 

d sin u = cos u du , 


d tan u = 


1 + M 


where u is any differentiable function of several variables. Since these are all 
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exactly the same in appearance as the formulas of elementary calculus, we 
refrain from presenting a formal list. 

All the formulas just considered afford us special instances of the following 
principle: A differentiable function of differentiable functions is differentiable. A 
fully precise statement and proof of the theorem embodying this principle will be 
given later (see §7.3). For the present we observe that it covers such statements 
as the following: 

Let u = /(x, y) and v = g(x, y) be differentiable functions of x, y. Let F(u, u) 
be a differentiable function of u, v, and let w = F(/(x, y), g(x, y)). The w is a 
differentiable function of x, y, and 

dw = — du + — dv 
du dv 


when the right side of this last equation is expressed entirely in terms of x, y, dx, 
and dy. 

A particular consequence of the principle enunciated in the previous 
paragraph is that the form of the relation 

df(x,y) = ^dx+f y dy 

is valid even when x and y are not independent variables, but are differentiable 
functions of other variables. This invariance of form is one of the very important 
properties of differentials. 

Example 2 . If z = log(x 2 + y 2 ), find dz. 

We use the formula 


with u = x 2 + y 2 . The result is 

j d(x 2 + y 2 ) _ 2x dx + 2y dy 

UZ — 2,2 — 2 i 2 

x +y x z + y 


The coefficient of dx here is — , 

dx 

verify. 


and that of dy is — , as the student should 
dy 


Example 3. Let a, b, c be the sides of a triangle, and let 6 be the angle 
opposite the side c. Regarding c as a function of a, b, and 0, find the differential 
dc. Use the result to find c approximately when a = 6.20, b = 5.90, and 6 = 58°. 
By the law of cosines we have 

c 2 = a 2 + b 2 - 2ab cos 0. 


Hence (using radian measure for 0), 

2c dc = 2a da + 2b db + lab sin Odd -2a cos 6 db -2b cos 6 da. 
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This equation permits us to calculate dc when a, b, 0, da, db, d0 are known. To 
solve the numerical problem proposed in the example we start from a triangle 
with a = 6, b = 6, 0 = 7t/3 (60°). Then cos 0 = 5 and c = 6. We are interested in Ac 
when Aa = 0.20, A b = -0.10, and A0 = - 7r/90 radians (the equivalent of -2°). 
Hence we set da = 0.20, db = -0.10, d0 = - 7r/90, and use dc as an ap- 
proximation for Ac. We obtain from (6.4-20), after dividing by 2, 

6 dc = 1.20 — 0.60 + 36(^)(— <^j) + 0.30 -0.60, 

6dc = - 0.79, dc = -0.13. 

Hence the new value of c is approximately 6-0.13 = 5.87. This result does in 
fact agree with the exact value of c to two decimal places. 


EXERCISES 

1. Find the differential of f(x , y) in each of the following cases. 

(a) x 4 + y 4 + 6xy (c) (x 2 + y 2 )lxy 

(b) e x y (d) (x 2 +y 2 ) 3/2 

2. Suppose f(x , y) = - 2 for all (x, y) except (0,0), and let /(0,0) = 0. Show that 

x r y 

/i(0, 0) and / 2 (0, 0) both exist with values equal to 0 but that / is discontinuous at (0,0). 

3. (a) Show that 

x d (vFT7) + yd (vx^+y) = 0 

for all values of x and y such that x 2 +y 2 >0, and for all values of dx and 
dy. (b) Generalize (a) by proving that 2*'=, x k d(xjr) = 0 if r = (x? + ■ • • + x 2 ) 1/2 and 
JCi, . . . , x n are independent variables. 

4. Find dz at x = 1, y = 7t(2 if z = x 2 y + e x sin y. What is the resulting value of dz if 
dx = tt - e, dy = e 2r ? 

5. If /(x, y) = (50- x 2 - y 2 ) 1/2 , find an approximate value of the difference /(3,4)- 
/(2.9, 4.1) by use of differentials. 

6. If u and v are differentiable functions of x and y, prove the formula 


d 



v du - u dv 


assuming that 0. 

7. Let f(x, y) = V|xy|. (a) Verify that /i(0, 0) = / 2 (0, 0) = 0. (b) Does the surface 

z = /(x, y) have a tangent plane at x = 0, y = 0? Consider the section of the surface made 
by the plane x = y. (c) Verify that for this case the requirement (6.4-4), with x = y = 0, 
cannot be satisfied, no matter how A and B are chosen. 

8. Assuming that / 1 and f 2 both exist at (x, y), prove that if 

f{x + K y + k) ~ fix, y ) - / i(x, y)h - f 2 (x, y)k 

hm vF+p 

()i,kW0,0) V fl r K 

exists, the limit is 0. 
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6.5 / COMPOSITE FUNCTIONS AND THE CHAIN RULE 

We use the term composite function for a function which is obtained from a 
given function of one or more variables by substituting other functions in place 
of the variables in the first function. As an example, consider the function 

F(x, y) = VFTy 1 . 

It is a composite function which may be built up as follows: Let 

/(u)= u' 12 , <f>(x, y) = x 2 +y 2 . 

On replacing u by cf)(x, y) we get 

/(*(*, y)) = F(x, y). 

As another example, consider the function 

f(x, y) = xy + log^- 

In this function let us replace x and y by certain functions of new variables r, 0 , 
as follows: 

x = r cos 0, y = r sin 0. 

The result is a function of r and 0: 

F(r , 0) = r 2 sin 0 cos 6 + log tan 0. 

Note that we may also write 

F(r, 0) = f(r cos 0, r sin 0). 

Most of the functions we deal with are built up as composite functions. We 
meet the concept of a composite function very early in the study of elementary 
differential calculus; it is there that' we learn the very important rule of 
differentiation embodied in Theorem II, §1.11. We are now going to be concer- 
ned with extensions of this rule for functions of several variables. 

Let us first consider a function of three variables, say 

u = F(x, y, 2 ). 

We are going to suppose that the variables x, y, z are made to depend upon a 
variable t ; let the notation for this dependence be 

x = f(t), y = g(t), z = h(t). X 

On substituting these functions for x, y, z in the function m = F(x, y, z), we 
obtain u as a composite function of t; if we denote this function by G(t), we 
have 

u = F(f(t), g(t ), h( 0) = G(t). t6.5-3) 

The formula of differentiation for this composite function is 
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We shall presently prove this formula under suitable hypotheses. It is very 
important for the student to learn the structure of this formula; as a part of such 
learning, he must grasp clearly the role of the variables in the notation of each 

du 

— , u ana x are reiaiea oy 
ox 


term in 


In the term — , u and x are related by 


u is dependent, 


and x is one of the three independent variables x, y, z. In the term -jj, x and t 
are related by the first of equation x is dependent and t is independent. 
In the term — , u and t are related by u is dependent and t is 

independent. An alternative notation for (fe.5^4)’ is 


dG = dF df dF dg dF dh 
dt dx dt dy dt dz dt 


This notation is clearer and less subject to misunderstanding. However, both 
methods of writing the formula are widely used, and the student should become 
familiar with both of them. 

There are other varieties of composite functions in addition to the type 
presented in The function F may depend on a dilferent number of 

variables (e.g., 2, or 4, 5, . . .). Also, the variables x, y, z may depend upon more 
than one variable. Suppose, for example, that we have 


u = F(x, y), 

* = /(s, t),y = g(s,t). 


(6,5-6) 


Then, under suitable differentiability assumptions, if we write G(s,t) = 
F(/(s, t), g(s, 0)> we have the differentiation formulas 


dG = dFdf dF dg 

ds dx dS dy ds ’ , 

(6.5-7? 

dG = dF df dF dg 
dt dx dt + dy dt 

The general rule covering all formulas such as (6,5^5) and is often 

called the chain rule , or the composite-function rule. 

To describe the situation generally, let us use the term first-class variables 
for the independent variables on which F depends, and the term second-class 
variables for the variables on which G depends. Observe that G is formed by 
replacing each first-class variable in F by a function of the second-class 
variables. In the differentiation formulas such as (6.5-7) we have as many 
different formulas as there are variables of the second class; each formula has as 
many terms as there are variables of the first class. 

We shall now formulate and prove a theorem about formulas such as (6.5-5) 
or (6.5-7). For simplicity we assume the situation is that represented in (6.5-6). 


THEOREM III. Let F(x, y) be defined in some region R of the xy -plane having 
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the point (a, b) as an interior point , and let F be differentiable at (a, b). Let 
f(s, t), g(s, t) be defined in some neighborhood of the point (s 0 , to) in the 
st-plane. Let these functions admit first partial derivatives at (s 0 , to), and 
suppose further that 


a = /(so, to), b = g(s 0 , to). (6.5-8) 

Then the composite function G(s, t) = F(f(s, t), g(s, t)) has first partial 

dF $ F 

derivatives at (s 0 , to), given by formulas (6.5-7), where — , — are evaluated 

dx dy 

at (a, b ), and the partial derivatives with respect to s , t are evaluated at s 0 , t 0 . 


Proof. It is to be emphasized that, although we are assuming that F is 
differentiable at (a, b), we are not drawing a conclusion about differentiability of 
G. Nor are we assuming differentiability of / and g. We deal merely with partial 
derivatives of /, g, and G. This form of the chain rule is therefore a little different 
from the chain rule for differentials given in Theorem V, §7.3. 

The proof begins with use of the fact that F is differentiable at (a, b). This 
means that the ratio 


F(a + Ax, b + Ay) - F(a, b) - F/a, b)Ax - F 2 (a, b)Ay 
V (Ax) 2 + (Ay) 2 


< 6 . 5 -^) 


approaches 0 as Ax ^>0 and Ay -»0. Here we assume that Ax and Ay are small, 
but not both 0. Let us define e (depending on Ax and Ay) to be the value of the 
fraction in (6.5-9) if the denominator is not 0; if Ax = Ay = 0 we define e = 0. 
Then we can write 

F(a + Ax, b + Ay) - F(a, b) = F](a, b)Ax + F 2 (a, b)Ay + eV(Ax) 2 + (Ay)\ 

(6.5-10) 


This is true even if Ax = Ay = 0. It is clear from the definition of e that as 
Ax 0 and Ay 0. 

Now we undertake to prove the first formula in (6.5-7). The second formula 
is proved in the same way. For small As( ^ 0) let us write 

Ax = /(s 0 + As, to) - /(so, to). 

Ay = g(s 0 + As, t 0 ) - g(«o, U), 

A u = F(a + Ax, b + Ay) - F(a, b). 


Then it follows from (6.5-8) and the definition of G that 


Au = G(s 0 + As, t 0 ) - G(s 0 , to), 


so that, by definition, 

lim = Gi(s 0 , to) 

As-+0 AS 

if the limit exists. Now, from (6.5-10) and the third formula in (6.5-11) we see 
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that 


Au Ax Ay V(Ax) 2 + (Ay) 2 

A S ~ F,(a ’ b) A7 + - F2(a ’ b) A7 +e Ts 


We are now ready to take limits on both sides of this equation as As -»0. On the 
right we have 


^-»7l(So, gl(« 0 , to). 

Also € — > 0 because Ax ^ 0 and Ay 0. [The fact that Ax -» 0 is a consequence of 
the continuity of f(s, t 0 ) as a function of s alone, at s — So; see Theorem I, §1.11. 
The reasoning for Ay^O is the same.] Finally, 

j V(Ax) 2 + (Ay) 2 j _ j~^Ax^ 2 | ^Ay^ 2 j 1/2 

has a definite limit. Thus we see that 

G,(s 0 , to) = Fi(a, b)/,(s 0 , to) + F 2 (a, b)g,(s 0 , t 0 ). 

This is the same as the first formula in (6.5-7) when the partial derivatives are 
expressed in the appropriate notations to show the values at (a, b ) and (s 0 , to). 


Example 1 . If F(x, y) is a given differentiable function of x, y, and if we 
introduce polar co-ordinates r, 0 by writing x = r cos 0, y = r sin 0, then F is 
transformed into a function G of r and 0: 

G(r, 0) = F(r cos 0, r sin 0). (6.5-12) 

Use the chain rule to find ~~ and in terms of ^r~ and 

dr 00 dx dy 

Here the first-class variables are x, y and the second-class variables are r, 0. 
We have 

dx dy . _ 

— = cos 0, 3 = sin 0, 

dr dr 

Sr~ rsin0> S =rcosa 


The chain rule, in the form (6.5-7) with r, 0 replacing s, t, then gives 


dG 

dr 


dF n ^dF . . 
= ~ cos 0 + — sin 0, 
dx dy 


dG 

dd 


dF . ^ dF a 

— r sin 0 + — r cos 0. 
dx dy 


(fe5-l^ 


To emphasize the meaning of these formulas, let us evaluate each part of them 
for r = 1, 0 = 7r/6. The corresponding values of x, y are x = V3/2, y = i Then, 
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using the subscript notation for partial derivatives, equations (6.5-13) become 


Gi 

G 2 




M/3 1\ 1 

l 2 9 2 ) 2 ' 


A/3 1\ V3 
V 2 9 2) 2 * 


(6.5-14) 


Thorough familiarity with the chain rule is of great importance to any student 
who wants to read books and journals in which mathematical methods are used 
extensively. Adequate training in the use of the chain rule must include, among 
other things, stress on the idea of transforming a given function of one set of 
variables into a new function of another set of variables, with a resulting set of 
formulas relating the two sets of partial derivatives of these functions. Ideas of 
this sort are constantly used in the theory of partial differential equations, and 
such ideas are at the very root of tensor analysis, which is an important subject 
beyond the scope of this book. Tensor analysis is an outgrowth and generaliza- 
tion of vector analysis. A brief discussion of some of the fundamental concepts 
of Euclidean vector analysis is given in Chapter 10. The chain rule plays an 
important part in the discussion of such concepts as gradient , divergence , and 
curl in vector analysis. 

For an example of a physical problem where the chain rule is involved see 

fill 

Matters of notation play a considerable role in connection with the chain 
rule. Wide varieties of usage exist in mathematical writing where the chain rule 
is concerned. Consider the situation expressed in (6.5-6). Instead of writing the 
chain rule here in the form (6.5-7), it is frequently written 


du _ du dx du dy V* * Y<\ 

+ C6 ‘ 5_I5) 

with a similar formula in which t replaces s. The difference between (6.5-7) and 
(6.5-15) is that in (6.5-15) we have used dependent variables in place of 
functional symbols. The student must bear in mind some of the subtleties of 
distinction between a dependent variable and a function. If we write u = 
F(x, y), u denotes the value of F at the point (x, y). If x = /(s, f) and y = g(s , t), 
and if F(x, y) is thereby transformed into G(s , t), we may also write u = G(s, t), 
since the value of G at (s, t) is the same as the value of F at (x, y) when x and y 
are the values of / and g, respectively, at (s, t). But though we may write 
u = F(x, y) = G(s, t ), this does not mean that F and G are the same function; in 
general they are not. 

Example 2 . If F is a differentiable function of two variables, and 

u = F(s 2 - t 2 , t 2 - s 2 ), 


show that 
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One method of handling this problem is to introduce new variables 

x = s 2 - t 2 , y = t 2 - s\ 

so that u = F(x, y). Then, by the chain rule, 


du 

ds 

du 

dt 


=S^ + ff ( - 2s) ’ 

= f(-2 t ) + |f(20. 


The correctness of (6.5-16) is now readily deduced. 


It sometimes happens that a problem involves several variables and several 
relations between them. Consider, for example, the four variables V, S, r, h 
connected by the two equations 

V = nrh, S = 2irrh + 2irr\ 


These formulas arise if we consider a right circular cylinder of height h, radius 
of base r, volume V, and total surface area S. Of the four variables, just two are 
independent. Ordinarily we most naturally think of r and h as independent, but 
other choices are legitimate. We may choose r and S as independent. Then 


h = ~ — r,V = 2 rS - tt r\ 

2ttv 




dV 

In view of what has been said, it is evident that a notation such as -r— is 

dr 

dV 

ambiguous, for if we calculate from (6.5-17) we have = 2nrh, while if we 


d V 

calculate from (6.5-18) we have -r— = iS — 37rr 2 , and these two results are not in 

dr 

agreement. What is needed is a notation that makes clear the choice of the 
independent variables. A customary notation employs subscripts. According to 
this practice, 



h 


indicates that we are regarding V as a function of the independent variables r, h, 
with h held constant in the differentiation. With this notation we have 



= 2irr, 


and so on. 

Situations like this arise in thermodynamics, for example, with different 
variables, of course, and different relations between them 


EXERCISES 

In all these exercises it is assumed without explicit reference that all the functions 
introduced are differentiable. 
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1. If w is a function of p, q, r and each of these latter variables is a function of s, 
write the chain-rule formula for 

ds 


dG 


dG . 


2. If u = x -y and v = 2xy transform F(u, v) into G(x, y), find — and — in 


dx 


dy 


, f dF A dF 

terms of — — and - — 
du dv 

3. Define the composite function G and write the formula or formulas of the chain 
rule in each of the following situations. Identify the first- and second-class variables in 
each case. 

(a) u = F(x, y, z), x = f(p, q ), y = g(p, q), z = h(p, q). 

(b) u = F(w), w = < f>(x , y). 

(c) u = F(x, y, z, 0), x = f(t ), y = g(t), z = h(t), 6 = t. 

■ 4. If x = n, y = u + v and z = u + v + w transform F(jc, y, z) into G(u, v, w), find 

each of the first partial derivatives of G in terms of the first partial derivatives of F. 

5. Suppose u depends on jci, x 2 , x 3 and the x’s depend on £ 1 , £2 as follows: 

Xi = Un£i + a 12^2 
X 2 — Cl 2 l£l + £ 22^2 
X 3 = d3l£l + # 32^2 

where the a’s are constants. Write the formulas connecting -^r and^ir with and 

d£i d£ 2 dXi dx 2 dx 3 

Generalize for the case of nx’s and mf s. 

6. If w = F(a + at, b + /3 t, c + yt), where a , h, c and a, /3, y are constants, write a 

dw 

formula for involving Fi(a + at, b + /3 t, c + yf) and other similar expressions. 

/ 7. If u = /(x - y, y - x), prove that ~ = 0. 

D Tf „(y-x Z-x\ .. * 2 0W , 2 , 2 du n 

8. If u = F z » )> prove that x — + y 2 — + z — = 0. 

V xy xz ) v dx J dy dz 

rr 3 j-i / y z\ .. . du . du . du - 

J 9. If m = x F 1 — » — )» prove that x— +y— +z — =3u. 

\x x/ dx dy dz 

10. Let /(x, y) be transformed into g(u, u) by x = u cos 6 - v sin 0, y = 
w sin 0 + u cos 0, where 0 is constant. Show that 

s)'+er=er+©‘ 

is an identity in u and v. 

11. If w = /(x, y) and x = r cosh 0, y = r sinh 6, find - p in terms of -|p 

, dw 
and - — 

0y 

12. Let F(x, t) = /(x + 2t) + /(3x - It). Set u = x + 2f, u = 3x - 2t, and show by the 
chain rule that 

F,(x, t) = f(u) + 3f(t>), 

F 2 (x, t) = 2f\u)-2f\v). 
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Hence show that Fi(0, 0) = 4/'(0), F 2 ( 0, 0) = 0. 

13. Let z = yf(x 2 - y 2 ). Show that y + x ~ = — • • 


dx 


dy 


14. If z = xy + xF show that x + ? |^ = + z * 

15. Suppose u is a function of r and r - (x 2 + y 2 + z 2 ) 1/2 . Show that 

©'-«)■ 

16. Consider the determinant 


a ii 

ai2 

a, 3 

a 21 

a 22 

a 2 3 

a 3 1 

a 32 

a 33 


D 


Let Ay be the cofactor (appropriately signed minor) of ay in D. If the nine elements of D 

dD 

are regarded as independent variables, show that - — = Ay. Now suppose that the 

o ay- 

elements ay are differentiable functions of x, and let ay = (ay). Show by the chain rule 

that 

dD 
dx 

17. Generalize the results of Exercise 16 for determinants of order n. State the final 
result as a verbal rule. 

18. If 


ah 

a i 2 

ah 


an 

an 

a ]3 


an 

ai 2 

an 

a 2 i 

a 22 

a 23 

+ 

ah 

a 22 

a 23 

+ 

a 2 i 

a 2 2 

a 23 

a 3 i 

a 32 

a 33 


a 3J 

a 32 

a 33 


ah 

a 3 2 

a 33 


F(x, y,z) = 


f Ax) Hx) f 3 (x) 
f i(y) H y) H y) 
Hz) Hz) Hz) 


dF dF dF 

calculate — » — » and — by the results of Exercise 16. 
dx dy dz 

19. Write out in detail and prove the version of Theorem III that corresponds to 
(6.5-1), (6.5-2), and (6.5-5). 

20. In Example 2 let G(s, f) = F(s 2 - f 2 , f 2 - s 2 ). Formula (6.5-16) now becomes 
fGi(s, t) + sGi(s, t) = 0. Rewrite the chain-rule formulas in the solution of Example 2 
without using the letters x, y, u. 

21. From the situation expressed by (6.5-17) find , (f^) * (f^) ancl 

(i), 

22. (a) If w = x 2 +y 2 +z 2 and z = xyv, how many meanings are there for 

(b) Find in each of its meanings, indicating the meaning in each case by proper 
dy dy 

subscripts. 

dV 

23. (a) If V = xyz and S = xy + 2xz + 2yz, find — in each of its possible meanings, 

oX 
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indicating the meaning in each case by proper subscripts, (b) Find 



24. If u=F(x,y) and y =/(x), show that ^ = + Explain carefully the 

difference between ~ and —■* 
dx ax 

25. If G(x, y) = F(x, y, f(x, y)), show that Gi(x, y) = Fi(x, y, z) + F 3 (x, y, z)/i(x, y), 
where z = /(x, y). Classify the variables and show that this is an instance of the chain 
rule. Write the corresponding formula for G 2 (x, y). 


6.51 / AN APPLICATION IN FLUID KINEMATICS 

A very good illustration of the occurrence of formulas such as (6.5-4) or (6.5-5) 
in physics is furnished in the very beginning of the study of the flow of a fluid. 

To build up a mathematical model of an arbitrary 
fluid in motion, let us fix our attention upon a certain 
portion of the fluid at a certain instant of time, say the 
fluid occupying a region R 0 at the time t = 0. Then, 
selecting an arbitrary particle of the fluid in R 0 , we 
follow its motion as t increases. Each particle of the 
fluid will occupy a certain point of space at a certain 
time. At a general instant t the portion of the fluid 
which occupied R 0 at t = 0 will occupy a new region 
R; the particle which was at the point (x 0 , yo, z 0 ) of R 0 will have moved to a point 
(x, y, z) of R (see Fig. 43). We assume that the law of motion of the fluid is a 
definite (but perhaps very complicated) law, so that the co-ordinates (x, y, z) of 
the particle in R are determined by (i.e., are functions of) x 0 , yo, z 0 , t, say 

* = /(*o, yo, z 0 , t), 

y = g(x 0 , y 0 , z 0 , t), (6.51-1) 

z = h(x o, y 0 , z 0 , t). 

These equations define the flow of the fluid for time subsequent to t = 0. In what 
follows we shall regard the functions /, g, h as known without specifying them in 
any particular manner. 

The velocity of an individual particle at a given instant has components 

dX dy dZ 
dt’ dt’ dt' 

The velocity is a vector quantity; it is tangent to the path which is being followed 
by the particle under consideration. 

Now let us consider the density of the fluid. If the mass of a substance is not 
distributed uniformly throughout its volume, it has variable density, and we must 
speak of the density at a point. Let us then consider the fluid in the region R at 
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time t; let p be the density at the point x, y, z. Then p is a function of x, y, z. 
Conceivably p might be constant, but it need not be so. (The fluid might be the 
atmosphere, and R might be a region extending from sea level up to a height of 
30,000 feet.) It is important to note that p does not depend merely on x, y, z. It is 
also a function of t , for the fluid which occupies a given region at time t need not 
have the same mass distribution as the fluid which occupies that region at some 
other time. Let us write 


p = F(x, y, z, t). (6.51-2) 

If in this equation we keep t fixed and vary the point (x, y, z), we obtain the 
density of the fluid in R at time t. On the other hand, if we fix the point (x, y, z) 
and vary f, then we obtain the density at (x, y, z) of the fluid occupying R at 
different times. 

There is still another important way of examining the variability of p. 
Imagine an observer moving along with a particle of the fluid. If he were capable 
of measuring p at his own position at any time, by what equation would he 
describe his observations? For him p is the composite function obtained by 
substituting x, y, z from (6.51-1) in (6.51-2). The result is that p is a function of 
x 0 , yo, z 0 , t . Since the observer stays with the same particle always, x 0 , yo, z 0 are 
to be considered as constants, and p becomes a function of t alone, say 

p = G(f). (6.51-3) 

Now let us ask the question: From the point of view of the observer moving 
with the particle, what is the rate of change of p? The answer is, of course, the 
value of the derivative G'(t). Since G is a composite function we may express 
G'(t) by the chain rule. We see in (6.51-2) that F depends on four variables, x, y, 
z, t. These are first-class variables. By equations (6.51-1) the variables x, y, z, t. 
depend on other variables x 0 , yo, z 0 , t. These are second-class variables. Note 
that t is both a first-class variable and a second-class variable. One way out of 
the perplexity which the student may at first experience in this situation may be 
found by an artifice of notation. Let us temporarily introduce a different letter, 
say 0, to stand for t in the set of first-class variables. We then write p = 
F(x, y, z, 0) and add the equation 6 = t to the set (6.51-1). The chain rule then 
gives 

dG = dF df dF dg dF dh dF dd 
dt dx dt + dy dt + dz dt dS dt' 

Having obtained the formula, we may forget the artifice, and write t again in 

on 

place of 0. Note that — = 1. It is more usual, in the literature of this subject, to 
dt 

write the letters p, x, y, z instead of the functional symbols for these variables. 
When this is done we have 
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The derivative of p with respect to t occurs here in two different forms, with 
different meanings. 

EXERCISES 

1. Let the fluid flow be the movement of the atmosphere. From the point of view of 
an observer standing on a street corner with the wind rushing by, what is the expression 
for the rate of change of density of air on the street corner? If the observer were to ride 
in the gondola of a freely-drifting balloon, what would be the expression for the rate of 
change of density of air at the gondola? 

2. Suppose the atmosphere over a great plain is in a state such that at any given time 
p is a function only of altitude above the plain. Suppose that a free balloon is drifting 
along at a constant elevation of 500 feet. Explain why an observer carried by the balloon 

finds that 4^ = ^7* 
at Bt 

3. Suppose that,/in (6.51-1), z is independent of t, and that 

p = H(x 2 + y* + z 2 , l). 

Show that ^ at all points on the z-axis. 


6.52 / SECOND DERIVATIVES BY THE CHAIN RULE 

The purpose of this section is to illustrate by examples the use of the chain rule 
in dealing with second derivatives. 

Example 1 . If F(x, y) is transformed into G(r, 0) by the equations x = 
d 

r cos 0, y = r sin 0, find — — in terms of derivatives of F with respect to x and 
or ou 


We start from the work already done in Example 1, §6.5. From (6.5-13) we 


have 

dG 8F . dF 

_ = __ r sm e + _ rcos e . 

We now differentiate both sides of this equation with respect to r, bearing in 
dF dF 

mind that and depend on x and y, and are thus to be regarded as 
dx dy 

composite functions of r and 6. We have 

d 2 G 


drdS 


-§h rsine) ~i{ fr) rsine 
+ ^i {rcose)+ i r (^) rcos e - 


(6.52-1) 


The problem now facing us is that of doing something further with expressions 

like — (— V Now — = Fi(x, y). What we really have before us is the problem 
dr \dx / dx 
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dH 

of finding where H(r, 9) = Fi(r cos 9, r sin 0). This is a problem exactly like 


dG 


that of finding — , except that we have Fi in place of F, and H in place of G. 
Thus, just as in (6.5-13), 

This may be written 


dH dF ] ^ aFj . n 
-t— = -i— cos 9 + — 1 sin 6. 
dr dx dy 


fdF\ = d F 
\dx) dx 2 


2 c q 2 p 

cos 9 + - — sin 0* 

ay ax 


In a similar way we find 


a (f)= 


a 2 F „^a 2 F . . 

- - — cos 0 + “7 2 sin 6. 
dx dy dy 2 


ar \ay 

If we now use (6.52-2) and (6.52-3) in (6.52-1), we obtain 

a 2 G aF . . /a 2 F . . a 2 F 


(6.52-2) 


(6.52-3) 


ar aa 


aF • * /a 2 F a 2 F . _\ . 

= - s,n 0 " (lF ' cos 0 + aTto s,n T s,n 0 

2 1 


, dF . /a 2 F 
+ — cos 9 + 
ay 


l d 2 F a ^d z F . 

la^ cos0 + ^ sin0 j 


(6.52-4) 


r cos 0. 


In order to get this far we naturally assume that F and its partial derivatives Fi, 
F 2 are differentiable functions of x, y. Under this assumption it is true that 
a 2 F a 2 F 

- — r- = ~ — — (see Theorem IV, §7.2). Therefore (6.52-4) may be written in a 
d y dx dx dy 

slightly more compact form. If we write u = F(x, y) = G(r, a), (6.52-4) becomes 


= - r sin 6 cos 6 0 + r(cos 2 6 - sin 2 6 ) 


+ r sin a cos a 


a 2 u 

ay 2 


sin a — + cos a 


dx 


du 

dy 


(6.52-5) 


a 2 u 


a 2 u . 


The work of finding formulas for —r and is similar. 

dr a a 

Example 2, If u = F(x, y) becomes G(s, f) when we set 

x = s + t, y - s — t, 

d 2 U d 2 U 

find what becomes in terms of derivatives with respect to s and t. 

dx L dy 1 

For this purpose we express 5 and t in terms of x and y; we regard s, t as 
first-class variables and x, y as second-class variables. We have 


Then 


s = \(x + y), t = 5 (x - y). 


du du ds du St _ 1 dU 1 du 

dx~ ds dx + dt dx~ 2 ds 2 dt 


(6.52-6) 
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Likewise, 


Differentiating again, 


du _ 1 du 1 du 
dy~ 2 ds 2 dt 


a 2 u i a /du). ijl ( du) 

dx-~2dx \ds/ + 2dx \dt) 


Now, by (6.52-6), with — in place of u, 

OS 


(du \ _ 1 _d_ /du) l d_ /dw\ 
\ds/“ 2ds \ds)^2dt \ds) 


We find f- 

dX 


in the same way. Thus 


d 2 u _ 1 d 2 u 1 d 2 u 1 d 2 u 1 d 2 u 

dx 2 4 ds 2 4 dt ds 4 ds dt 4 dt 2 


(6.52-8) 


d ^ 14 (3 ^ 44 

We shall assume enough about u to insure that - = - — — ■ The student should 

dt ds ds dt 

show for himself that 


d 2 u = 1 d 2 u 1 d 2 u 1 d 2 u 

dy 2 4 ds 2 2 dt ds 4 at 2 ' 

Subtraction of (6.52-9) from (6.52-8) now gives the result 

d 2 u d 2 u = d 2 u 
dx 2 dy 2 dt ds 


(6.52-9) 


(6.52-10) 


The basic point to be noted in using the chain rule in connection with second 
derivatives is this: Suppose u is a function of first-class variables x, y , . . . and of 
second-class variables s, t , . . . . Then, if we have written down a chain-rule 

formula for one of the derivatives —■» . . . , this same formula is valid if we 

ds dt 

replace u throughout by any one of the derivatives .... We are thus able 


to express symbols like (j^) ent i re ly 1 
respect to the first-class variables x, y, . . . . 


in terms of second derivatives of u with 


EXERCISES 

d 2 G d 2 G 

1. Find formulas comparable to (6.52-4) for — — and — r in the problem of 


Example 1. Hence show that 


d 2 G 1 dG 1 d 2 G = d 2 F d 2 F 
dr 2 + r dr r 2 86 2 dx 2 dy 2 
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2. If u = F(x, y ) becomes G(r, 0) when x = r cosh 0, y = r sinh 0, show that 
d 2 u .1 du 1 d 2 u d 2 u d 2 u 


dr 


dx 2 dy 4 


3. If G(s,t) = F(e s cos t, e s sin f), show that Gw + G 22 = e 2s (Fw + F 22 ), where Gw 
and G 2 2 are evaluated at ( 5 , t ) and Fw and F 22 are evaluated at (e s cos t, e s sin t). 

4. Give a complete proof of (6.52-9). 


5. If | = x + ct, t ] = x - ct, and u = F(x , t) = G(£, 17), find 


d U 2 0 U . f 

1F“ C a? m terms of 


derivatives with respect to £ and rj. 

, „ d 2 u 


6. Show that 5 


0 z u . d z u 


dx 4 


ax ay + 2 W becomes i# + i# if we set « = ’ (x + y) ’ ^ = 


k*-2y). 

7. If u = <f>(x - ct) + ^(x + ct), show that 


dV 


2 d 2 u 
c — 2 


8. (a) If li=F(x 2 - 2xy), let w = x 
2F'(w) + (2x - 2y ) 2 F"(w). 

(b) Find similar expressions for and 


dx‘ 

2xy and 


show that 


dx 4 


(c) Verify that x |^ + (x - y) 


0. 


9. (a) If u = F(r) and r = (x + y + z ) , show that + a . 

ox oy 0Z 


d 2 u 2 du 
dr 2 r dr 


Hence find the form of F(r) if this last expression is equal to zero when r >0. 
(b) Generalize the results of (a) for u = F(r), r = (x? + x 2 + 


• + x 2 „) l/2 . 


10. If V = ig^t where c is constant and r = (x 2 + y 2 + z 2 ) l/2 , show 


that 


d^v.d^v d 2 v _ 1 a 2 v 

0x 2 + 0y 2 + dz 2 c 2 dt 2 


11. If u = F[x + /(y )], show that 


12. Show that y^+(x + y) — 
y 2 -x 2 , t = y -x. 

13. If u = x 3 F(y/x, z/x), show that 

2 0 2 W 


du d 2 U _ dli d u 

dx dx dy dy dx 2 

u 

+ X 


d 2 u , _ 2 d 2 u 

w becomes -2t — 


2 1 when we set 

ds 


2 d 2 U 


2 d 2 U 


dx J 


dy 4 


dz 4 


. d 2 u , . du du 

2yz a^ +4y ^ + 4z JI 


6 u. 


14. If u = F(x, y) = G(s, t), where 5 = xy, t - 


find 


2xy 


y dx dx 

x 2 y 2 u in terms of s, t, u , and derivatives of u with respect to s and t. 


2U " + (y-y 3 )^+ 


du 


dy 


, . . f , 2 0 M , 2 0 U , dU , 0W . 

15. Prove that setting x = e , y = e changes x -r-y + y 


. 2 + x — + y — into -r-! 
0y 0x 0y 0s 


d z u 
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16. Suppose that x = f(u , u), y = g(u, u) transform F(x, y) into G(u , u). Suppose also 


J 3g , a/ dg 

du dv dv du 

+ a 2 G a 2 G 
Prove that — -y + —r = 
du dv 


/a^F a^F\r /a /\ 2 /a/V" 

\ dx 2 dy 2 )l\du) \dv) 


is an identity in u and v. 


17. If u = x n f(ylx) + x "g(y/x ), show that 


2 a 2 w 


ax' 


d 2 U 2 d 2 U dU du 

+ y J? + x te + y dj 


18. (a) If u = F(x, y) = G(s, t) and x = f(s, t), y = g(s, t), show that G n = 
Fii/t + F 22 g? + 2F 12 /igi + Fi/i i + F 2 gn, where for convenience all the variables have been 
omitted from the functional symbols, (b) Write this same formula using the variable u 
and not using any of the functional symbols F, G, /, g. (c) Write analogous formulas for 
G 12 and G 22 . 

19. Let 

£1 = a nXi + a 12X2 

£ 2 = a 2 iXi + CI 22 X 2 


where the coefficients an,... are constants. With this change of variable, show that 

d 2 U , ^ d 2 U , d 2 U d 2 U , „„ d 2 U d 2 U L ^ x f 

a + 2b + c an : ‘ a W + 2 / 3 hJT 2 + ? M' where the coeffic,ents a ’ b ’ c - and 

a, /3, y are related in such a way that 

ay -p 2 = (ac~ b 2 )(a u a 2 2 - a 2 iai 2 ) 2 . 

20. If the change of variable (with constant coefficients) 


changes 


show that 



CijXj 



d^u 

dXi dx j 


into 


2 bid 

k, r=i 


d 2 u 
dgk d& 


2i CkidijCif. 

i, j — I 


6.53 / HOMOGENEOUS FUNCTIONS. EULER’S THEOREM 

Consider the functions 

x 2 + y 2 , — > x 2 y log — * 
x + y x 

Each of these functions has the interesting property that, if the variables x, y are 
replaced by tx, ty respectively, where t is a parameter, we obtain the original 
function multiplied by a power of t : 

(tx) 2 + (fy) 2 = t 2 (x 2 + y 2 ), 

(*x)(ty) = t xy 

(tx) + (ty) x + y’ 

(tx) 2 (ty) log ~ = t 3 (x 2 y log^)- 
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Functions of this type are called homogeneous. In general, we say that F(x, y) is 
homogeneous of degree n if 

F(tx,ty)=t"F(x, y) (6.53-1) 

for all values of x, y, and t for which F(x, y) and F(tx,ty) are defined. The 
degree n is a constant; it need not be an integer, and it may be negative. It may 
also be zero. 


Example 1. (a) 


x 2 + y 2 


is homogeneous of degree — 1. 


(b) x ,73 + xy" 2/3 


is 


1 x 2 - y 2 

homogeneous of degree i (c) - 2 2 is homogeneous of degree 0. 

x *+• y 

If the fundamental relation (6.53-1) holds true only when t is restricted to 
positive (or nonnegative) values, we say that F is positively homogeneous of 
degree n. An example in w hich th e limitation to nonnegative t is essential is 
furnished by the function Vx 2 + y 5 , which is positively homogeneous of degree 
1. Since by definition the radical sign calls for the nonnegative square root of the 
radicand, 

V(fx ) 2 + (ty ) 2 = fVx 2 +”y 5 

holds only if t ^ 0. 


The definition of homogeneity is extended in an obvious way to functions of 
any number of variables. 

The meaning of homogeneity can be interpreted 
geometrically. We refer back to §5.4, where we discus- 
sed modes of representing a function. If (x, y) is a point 
distinct from the origin in the plane, the set of all 
points (tx, ty), as t varies, fills out the line determined 
by (x, y) and the origin (see Fig. 44). The fundamental 
relation (6.53-1) states that the value of the function 
F at the point (tx, ^y) is t n times the value of the func- 
tion at the point (x, y). Thus, once we know the value 
of F at a point other than (0,0) on the line, we can 
compute its value at all other points on the line where 
it is defined. We must of course know the degree n. If 
the function is merely positively homogeneous, the requirement I = 0 limits us to 
points (tx, ty) on the same side of the origin as (x, y) itself. It is clear that if F is 
positively homogeneous, and if we know its values at all points of the circle 
x 2 + y 2 = 1, we can compute its value at any other point where it is defined. 

Many simple or important functions arising in applied mathematics are 
homogeneous or positively homogeneous. For example, if F is such that its 
values are directly proportional to the nth power of the distance from the origin 
to (x, y), then F is positively homogeneous of degree n. In fact, we can write 

F(x, y) = Ar", r = (x 2 + y 2 ) 1/2 , whence F(tx, ty) = t n F(x, y) if t § 0. 

One of the useful pieces of information about homogeneous functions is that 
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furnished by a theorem named after the 18th century Swiss mathematician 
Leonard Euler. 


EULER’S THEOREM (THEOREM IV). Let F(x,y) be positively homogeneous 
of degree n. Then at any point where F is differentiable we have 

d F dF 

x—+y — =tiF( x ,y). (6.53-2) 

Proof. Write u = tx, v = ty. Consider u, v as first-class variables, and t , x, y 
as second-class variables. The partial derivative of F(u, v) = F(tx, ty) with 
respect to t is 

F|(h, v)jf-+F 2 (u, 
or 

*Fi(tx, ty) + yF 2 (tx, ty), 

provided that F is differentiable at ( tx , ty). On the other hand, for positive t, 
F(tx, ty) = t n F(x, y), and -~(t n F(x, y)) = nt n ] F(x, y). Thus 

xF\(tx, ty) + yF 2 (tx , ty) = nf nl F(x, y). (6.53-3) 

If we put t = 1 here we obtain the desired relation (6.53-2). 

The theorem and the proof extend to functions of three or more independent 
variables. There is also a converse theorem; see Exercise 6. 


Example 2. If F(x, y) is positively homogeneous of degree 2, and u = 
r m F(x, y), where r 2 - x 2 + y 2 , show that 


<F_ 

dX 


" + 0- r "(0 + 0) + m,m+4lr "~ ,R 


In working this problem it is convenient to observe that 


~ dr dr x 

2r — = 2x, or — = -■ 
dx dx r 


We have, therefore, 
du 


= r m ^+mr m -'|^F= r m ^+rnr m - 2 xF. 
dx dx dx dx 


Differentiating again. 


«!“ - r m + __«-i it 3F 2 dF 

dx 2 ~ r dx 2+mr dx^ +mr 


+ mr m ~ 2 F + m(m~ 2 )r m ' 3 — 

dx 


xF 


d 2 F 


dF 


r m + 2 mr m - 2 x^+mr m - 2 F + m(m - 2 )r m ~ 4 x 2 F. 


dx 
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Because of the symmetrical occurrence of x and y in r, we can write down at 
once 

T"T = r m + 2tnr m2 y mr m2 F + m(m - 2)r m ~ 4 y 2 F. 
dy dy J dy 

If we now add, and take note of the relations 

x if +y |r =2F ’* 2+y2=r2 ’ 

we find 

^ + ^=r m (?Pr + ?X) + 4mr m - 2 F + 2mr m - 2 F+m(m-2)r m - 2 F 
dx 2 dy 1 \dx l dy 1 / v ' 

= r ™ (0 + 0) + m (m + 4)r m ~ 2 F. 

This is the required result. Question for the student: Where was use made of the 
homogeneity of F? 


EXERCISES 

1. (a) If F is positively homogeneous of degree n, and if F, Fi, and F 2 are 
differentiable, the equation 


d 2 F d 2 F , 2 d 2 F 

X *l^ + 2Xy ^ +y 1JT= 


is valid. Prove this by differentiating (6.53-3) partially with respect to f. (b) State and 
prove the corresponding result for third derivatives. 

2. If F is positively homogeneous of degree n, and differentiable, its first partial 
derivatives are positively homogeneous of degree n - 1. Prove this. 

3 . Let H(x, y) be positively homogeneous of degree p, and let u = r m H(x , y), where 
r 2 = x 2 + y 2 . Show that 

A u = r m AH + m(2p + m)r m 2 H, 


where, for any function F(x, y), the notation AF means 


d 2 F d 2 
dx 2 dy 2 


4. The equation AF = 0 (see the explanation of notation at the end of Exercise 3) is 
called Laplace’s equation in two dimensions. Here it is understood that (x, y) are 
rectangular co-ordinates of a point in the plane. If H(x, y) is positively homogeneous of 
degree p, and AH = 0, show that A(r _2p H) = 0 also. Use Exercise 3. Examples are 
furnished by taking H(x , y) to be one of the functions x, y, x 2 - y 2 , 2xy, 3x 2 y - y 3 , with 
appropriate values of p in each case. 

5. (a) If x, y, z are rectangular co-ordinates in space, we write 


ACV , d 2 F M d 2 F M 

AF( X’^ ) = ^V + 


d 2 F 

dz 2 ' 


If u = r m H(x, y, z), where r 2 = x 2 + y 2 + z 2 , and if H is positively homogeneous of degree 
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p, work out the result analogous to that of Exercise 3. (b) What is the result for three 
dimensions comparable to that of Exercise 4? (c) Develop generalizations of the results 
of (a) and (b) for the case of n dimensions, taking r 2 = x? + • • • + x 2 n . 

6. Suppose that F(x, y) is defined and differentiable in an open region R, and suppose 
that x dF/dx + y dF/dy = nF(x , y) at each point of the region. Then, if (x, y) is in R, the 
relation F(tx, ty) = t n F(x,y ) holds in any interval t 0 <t <t , (where t 0 ^ 0) provided that 
t = 1 is in this interval and provided that, for all such t, the points (tx, ty) are in R. To 
prove this converse of Euler’s theorem, let f(t) = F(tx, ty), where (x, y) is a fixed point of 
R. Use the hypothesis on F to prove that tf'(t) = nf(t). From this, infer that f(t)t~ n is a 
constant (depending on x, y). Then complete the proof. 


6.6 / DERIVATIVES OF IMPLICIT FUNCTIONS 

In §6.1 we dealt with the differentiation of implicit functions in a variety of 
particular situations. We did not attempt to deal with general cases in which the 
functions were merely indicated by some functional symbol. In practice it is 
necessary to have formulas to deal with implicit functions in terms of general 
notation. 

A simple but typical case is that arising when z is defined as a function of x, 
y by an equation of the form 

F(x, y, z) = 0. (6.6-1) 

Suppose, for instance, that the equation is 

x 2 + 2xz + z 2 -yz- 1 = 0, 

so that in this case 

F(x, y, z) = x 2 + 2xz + z 2 - yz - 1. (6.6-2) 

Proceeding as in §6.1, we have 

2« + 2,f + 2z + 2,fr»f.0. 


. dZ . - dZ dZ A 

2jc-t— + 2zt — y - — z = 0, 

ay ay ay 


dz _ 2 jc + 2z 
dx 2 x + 2z - y ’ 


dz _ - z 

ay 2x+2z — y 


(6.6-3) 


Now let us observe that if we regard x, y, z as independent in (6.6-2), then 
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Accordingly, equations (6.6-3) take on the form 


dF 


dF 

dx 

dz 

dy 

dF' 

dy~ 

dF 

dz 


dz 


(6.6-4) 


We are going to show that these formulas are general; that is, they hold when z 
is defined by (6.6-1), subject to certain general assumptions on the otherwise 
arbitrary function F. 

We are not just now concerned with knowing what assumptions to make 
about F in order to guarantee that equation (6.6-1) does in fact define z as a 
function of x, y. Questions of this kind will be dealt with in Chapter 8. We 
assume that z is a well-defined function of x, y, and that equation (6.6-1) is an 
identity in x, y when z is replaced by its functional value: 

z = f(x, y), (6.6-5) 

F(x, y, /(x, y)) = 0. (6.6-6) 

A simple instance of these last relations would be afforded by 

z = Vl - x 2 - y 5 , F(x, y, z) = x 2 + y 2 + z 2 ~ 1. 

Another instance is 

z = log xy, F(x, y, z) = xy - e\ 

We also assume that F is a differentiable function of the three independent 
variables and that /(x, y) has first partial derivatives. Under these conditions we 

dF 

shall prove formulas (6.6-4). Naturally we must assume that the denominator — 

oZ 

is not zero. 

Under the foregoing hypotheses we look upon G(x, y) = F(x, y, /(x, y)) as a 
composite function of x, y. The first-class variables are . x, y, z, and the 
second-class variables are x, y. The relations between the variables of the two 
classes are 


x = x,y = y,z = f(x,y). 


Therefore, 


i a* = n = n i 

dx ’ dy 'dx ’ dy 


The chain rule gives 
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Likewise 


p , p dz 

dy- F2+F} ^' 


Now by (6.6-6), G(x, y) = 0, and — = — = 0. Therefore, assuming that F 3 ^ 0, 

ox dy 

we have 

dz _ F] dz __ F 2 
dx F 3 dy F 3 

These are the same equations as (6.6-4) except for the difference in notation. It 
is understood that the derivatives F t , F 2 , F 3 are evaluated with z = f(x, y). 

It is easily seen that the foregoing considerations may be generalized as 
follows. Suppose a function of n variables, say u = f(x i, . . . , x„) is determined 
implicitly as a solution of an equation F(x \, . . . ,x n , u) = 0. Then, under suitable 
assumptions of differentiability, we have 


d/ _ _ Ff(xi, • . . , x n , /(xi, . . • , x n )) ^ j 
dXi - F n+ i(xi, . . . , x n , /(x 1 , . . . , x rt ))’ 


(6.6-7) 


Example 1. Show that if z = /(x, y) is a solution of the equation F(x, y, z) = 
0, then the line normal to the surface z = /(x, y) at a given point has direction 
ratios Fi.*F 2 :F 3 , the partial derivatives being evaluated at the point in question. 
It was shown in §6.2 that the line normal to the surface has direction ratios 

By (6.6-4) an equivalent set of ratios is 

F\ , F2 . < 

F 3 f 3 a * 


The ratios will not be altered if we multiply by -F 3 . In this way we obtain the 
ratios Fi : F 2 : F 3 for the direction of the normal line. 


Next let us consider the case of two functions which arise as solutions of a 
pair of simultaneous equations. Suppose, for instance, that 


u = f(x, y, 2) and v = g(x, y, z) 

(6.6-8) 

are solutions of 


F(x, y, z, u, v) = 0, G(x, y, z, u, v) = 0. 

(6.6-9) 

That is, suppose that 


F(x, y, z, f(x, y, z), g(x, y, z)) s 0 

(6.6-10) 


is true for all values of x, y, z in some region, with a similar relation for G. We 
assume that the functions F, G, /, g are all differentiable. The problem which we 
pose is that of expressing the partial derivatives of / and g in terms of those of F 
and G. As usual, we write 


dF dF dF 

r I — 1 ri = » * * • f i* — ! 

1 dx 2 dy 5 dv 
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The required formulas are then of the form 



F, 

f 5 


f 3 

f 5 

du _ 

G i 

g 5 

du 

G } 

g 5 

dx 

f 4 

f 5 

9 ‘ ‘ * ’ dz~ 

f 4 

f 5 


g 4 

g 5 


g 4 

g 5 


f 4 

F, 


f 4 

f 3 

dv _ 

g 4 

G i 

dV __ 

g 4 

g 3 

dx ~ " 

f 4 

f 5 

’ * ’ dz~ 

f 4 

f 5 


g 4 

g 5 


g 4 

g 5 


(6.6-11) 


This is on the assumption that the determinant appearing in the denominators is 
not equal to zero. The expressions F lf ... ,G 5 are evaluated with u and v given 
by (6.6-8). 

We shall indicate briefly the manner of deriving formulas (6.6-11). We regard 
x, y, z, u, v as first-class variables, and x , y, z as second-class variables. Then, 
from the identity (6.6-10), we have (differentiating with respect to x) 


F ' +F <f +F ’t-°- 


There is a similar equation involving G : 


°' +0 ‘£ +0 ’£- 0 ' 


These two linear equations are now solved for 4^ and 4~> with the results 

oX oX 

indicated in (6.6-11). The procedure is entirely similar for finding the derivatives 
of u and v with respect to y or z. 

The student need not memorize the formulas (6.6-11), though it is useful to 
have ready reference to such formulas. It is the procedure for deriving the 
formulas which is important, and the student should be able to carry out the 
derivation himself. 

Determinants such as those occurring in the formulas (6.6-11) are called 
Jacobians, after Carl Jacobi, a prominent German mathematician of the nineteenth 
century. There is a more compact notation which is often used for a Jacobian: 



dF dF 

a(F, G) _ 

du dv 

d(u, t)) 

dG dG 


du dv 

of second order. The 
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Jacobian involves n functions, each of n variables: 

dF x dF x dF x 

dU\ dU2 du n 

dF 2 
dU\ 

HFu ■ ■ ■ , Fn) 

d(u,,...,u„) 

§Fn . ... 

du | du n 

In concluding this section we emphasize that here we are concerned with the 
structure of general formulas for the derivatives of implicit functions. Questions 
of existence and differentiability of implicit functions are taken up in Chapter 8. 


EXERCISES 

1. If the equations 

F(x , y, u y v , w) = 0, G(x, y, u , v, w) = 0, H(x, y, u, u, w) = 0 

have solutions for u, v, and w as functions of x and y, show, taking for granted certain 
general conditions, that 

d(F,G, H) 
du _ <3(x, v, w) 

5x d (Fy G y H) 
d(u, v, w) 

Write five other similar formulas. 

dG 

2. Suppose y = f(x) is a solution of G(x, y) = 0, where — ^0, and let g(x) = 

dy 

F(Xy /(x)). Show that g'(^) = (FiG 2 - F 2 G i)/G 2 , the right side being evaluated with y = /(x). 

3. If z = /(x, y) is a solution of F(x, y, z) = 0 (with F 3 ¥■ 0), and if H{x, y) = 

G(x, y, /(x, y)), show that ~ Write a similar formula involving 

dz dy d (y, z) dx 

4. Suppose u=f(Xy y, z), u = g(x, y, z) are solutions of F(x, y, z, n, t>) = 0, 
G(x, y, z, Uy u) = 0. Let K(x, y, z)= H(x, y, z,/(x, y, z), g(x, y, z)). Show that 

3(Fy Gy ft) 

(IK _ d(Zy Uy V) 

dz ~ d{F, G) 

under suitable conditions. v ^ 


5. As an instance of (6.6-8) and (6.6-9) consider u = x sin xyz, v = y cos xyz as 
solutions of yu - xu tan xyz = 0, y 2 u 2 + x 2 u 2 - x 2 y 2 = 0. Verify all six of the formulas 
(6.6-11) in this case. 

6. If the equation F(x, y, z) = 0 can be solved for each one of the variables in terms 
of the other two, show, taking for granted certain general conditions, that 
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7. Let G(x, y, z, u) = 0 have solutions x = /( y, z, t>), y = g(x , 2 , u), z = h(x, y, u), and 
by means of one of these equations at a time let u = F(x, y, z) become a function of 
(y, z, u), (x, z, u), and (x, y, t>) respectively. Show that, subject to certain conditions, 

/du\ _ / du\ /dz\ / du\ /dx\ 

\^y/x,v \dy)z,v Kdz/^y \dy) XtV \dx)y, z \dy) z , v ' 

An example is furnished by xyu - z = 0, u = x 2 + y 2 + z 2 , and the student should check the 
meaning of the problem in terms of this special case if he feels an illustration to be 
necessary. 

8. Suppose n=/(x, y) is a solution of F(x, y, m) = 0, and that y = g(x, z) is a 
solution of G(x, y, z) = 0. Let H(x, z) = /(x, g(x, z)). Show that F 3 G 2 H 2 = F 2 G 3 and 
F 3 G 2 Hi = F 2 Gi - FiG 2 are identities in x and z. 

9. (a) Starting from (6.6-4), show that 

d 2 z __ F 3 F 11 — 2 F 1 F 3 F 13 + F 2 F$3 
-Fl 


(b) Derive analogous formulas for and 

ay ax ay 

10. If z = /(x, y) satisfies an equation of the form z = F(ax + hy + cz), where a, b, 

and c are constants, show that b = a 

dx ay 

11. If z = /(x, y) satisfies an equation of the form F(x + y + z, x 2 + y 2 + z 2 ) = 0, show 
that (y-x) + (y-z)|| + (z-x)|| = 0. 

12. Suppose that the function z = /(x, y) satisfies an equation of the form 
F(ax + by + cz, Ax 2 + By 2 + Cz 2 ) = 0, where a, b, c and A, B , C are constants. Show 

. az _ aFi + 2AxF 2 
t a ax" cF 1 + 2CzF 2 ‘ 

13. If z = 4>(x, y) satisfies the equation F(/(x, y, z), g(x, y, z)) = 0, show that 


az _ Fi/ 2 + F 2 g 2 

ay F1/3 + F 2 g3 


14. If G,(xi, x 2 , y), G 2 (xi,x 2 , y) and /(xi,x 2 ) are given, and if gi(xi,x 2 ) = 
Gj(xi, x 2 , /(xi, x 2 )) (i = 1, 2), show that 


d(gi> g 2 > _ a(Gi, G 2 ) t a/ a(Gi,G 2 ) ^ a/ a(G t ,G 2 ) 
a(xi,x 2 ) a(xi, x 2 ) axr a(y, x 2 ) ax 2 a(xi, y) ’ 


with y replaced by /(xi, x 2 ) after the differentiations. This formula is used in the theory 
of first-order partial differential equations. 


6.7 / EXTREMAL PROBLEMS WITH CONSTRAINTS 

Many interesting maximum or minimum problems arise in such a form that we 
are required to find an extremal value of a function, say F(x, y, z), where the 
variables x, y, z are not independent of each other, but are restricted by some 
relation existing between them, this relation being expressed by an equation 
G(x, y, z) = 0. 
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Example 1. Find the minimum value of 

F(x, y, z) = (x - 3) 2 + (y - 2) 2 + (z - l) 2 
subject to the condition 

G(jc, y, z) = 2x - 3y - 4z - 25 = 0. 

This problem has already been solved; see Example 1, §6.3. 

The equation G(x, y, z) = 0 is called a constraint on the variables x , y, z. It is 
immaterial whether the equation of constraint has the form G(x, y, z) = 0 or 
G(x, y, z) = k, where k is a specified constant, for the latter form of constraint 
can be written G(x, y, z) - k - 0. 

An extremal problem with constraint may occur with any number of 
variables', and there may be more than one equation of constraint. 

Example 2. Find maximum and minimum values of x 2 + y 2 + z 2 subject to the 
two conditions 

r 2 

x f y - z = 0, + y 2 + z 2 = 1. 

In this problem there are two constraints on the three variables, so that there 
is actually only one independent variable. 

Example 3. Find the minimum value of (x - u) 2 + (y - u) 2 + z 2 subject to the 
condition 3x 2 + y 2 — 6x — 4y — 12z -F 43 = 0. 

In this problem there are five variables and one constraint, which happens to 
involve only three of the five variables. 

There are various methods for dealing with extremal problems with con- 
straints. One met hod is to use the equation or equations of constraint to express 
certain of the variables in terms of the remaining variables. These latter 
variables are chosen as the independent variables, and the function whose 
extremal value is sought is then expressed in terms of the independent variables 
only. The solution is then carried out by standard methods. We shall call this 
general method the met hod of direct elim ination. It is illustrated, in the case of 
the problem of Example 1 in the present section, by the solution given in 
Example 1, §6.3. 

A seco nd method may conveniently be called the method of implicit func- 
tions. Suppose the problem is to find the point or points at which F(x, y, z) has 
extreme values subject to the conditions G(x, y, z) = k. Here we assume that F 
and G are given functions with continuous first partial derivatives, and that k is 

dG 

a given constant. Let us assume, for definiteness, that — #0 and that the 
equation 

G(x, y, z) - k = 0 (6.7-1) 


has a solution 


z =/(x, y). 


(6.7-2) 
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We then seek to make the quantity 

u = F(x, y ,f(x, y)) 

a maximum or minimum. Accordingly we want to solve the equations 

dx dy 


(6.7-3) 


Now 


dx dx dz dx dy dy dz dy 9 

where z is replaced by f(x , y) after the differentiations are performed. 
We also have the identity 


(6.7-4) 


G(x, y,f(x, y)) — k = 0, 

from which it follows by differentiation that 


dG + 3G dj_ _ 8G_ , dG df_ _ 

dx dz dx ’ ay dz dy 


(6.7-5) 


If we solve these equations for and and substitute in (6.7-4), we obtain the 
equation 


dF dG dF dG 
du _ dx dz dz dx 
dx ~ dG 
dZ 


and a similar equation for — • Equations (6.7-3) now take the form 

aF aG_aFaG = WdG__WdG_ = 

dx dz dz dx ’ ay dz dz dy {bJ 

in which z is replaced by /(x, y) after the differentiations are performed. Now 
the method of implicit functions for this extremal problem with constraint may 
be described as follows: We do not actually solve for z at the outset. Instead, 
we carry the work along and arrive at equations (6.7-6) as equations in all three 
variables. These two equations, together with the constraint (6.7-1), give us three 
equations which we solve as simultaneous equations in x, y, z. The required 
points of extreme value will be among the points found in this way. This general 
assertion is subject to some qualifications to rule out exceptional cases. We 
could, of course, think of y as a function of x and z, or of x as a function of y 
and z, the functional relation in each case being determined by the constraint. 
These alternatives would give us pairs of equations different from (6.7-6), but 
equivalent to them. 

The implicit-function method was used in the solution of the problem of 
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Example 3, §6.3, with y regarded as a function of x and 2 . Our purpose just now 
is not to illustrate the method in particular cases, but to obtain the equations 
(6.7-6) in preparation for study of other aspects of extremal problems with 
constraints. 

It is often useful to regard an extremal problem with constraint from a 
geometrical point of view. Consider, for instance, the problem of extremal value 
for the function F(x, y, z) with the constraint G(x, y, z)= k (a given constant). 
We shall suppose that the equation G(x, y, z)= k defines a surface S possessing 
a tangent plane at each of its points with which we are concerned. We then 
consider the values of the function F at points of the surface S; we wish to find 
the points of S where F has maximum or minimum values. Now consider the 
family of surfaces 

F(x, y, z) = C, (6.7-7) 

where C is a parameter. Suppose that M is the absolute maximum value attained 
by F at points of S. Then, if C > M, the surface (6.7-7) and the surface S will 
have no points in common, whereas if C = M the two surfaces will have at least 
one point (x 0 , y 0 , Zo) in common, and this will be a point where F attains its 
maximum value on S. We shall prove that the two surfaces G(x, y, z) = k and 
F(x, y, z) = M are tangent at this point (x 0 , y 0 , Zo). The direction ratios of the 
normals to the two surfaces are, respectively, 

Gi:G 2 :G 3 and Fi:F 2 :F 3 , 

with the partial derivatives evaluated at (x 0 , y 0 , z 0 ) (see Example 1, §6.6). The 
condition of tangency is therefore that these two sets of ratios be the same, i.e., 
that G i, G 2 , G 3 and F i, F 2 , F 3 be proportional. Now we saw earlier that equations 
(6.7-6) must hold at a point of extreme value, provided G 3 ^ 0. These equations 
may be written 

FjG, - F 3 G, = 0, F 2 G 3 - F 3 G 2 = 0, 
or 

F l = pG„F 2 = pG 2 , 

g 3 g 3 

whence it follows that Fj, F 2 , F 3 and G u G 2 , G 3 are proportional at (x 0 , yo, Zo). If 
it should happen that G 3 = 0 at the point, the same conclusion of proportionality 
can be reached by assuming that G 2 5 * 0 or G } ^ 0. 

The argument and the conclusion are essentially the same in the case of a 
minimum rather than a maximum. Also, the surfaces will be tangent at a point of 
relative extreme value of F on the surface S. There may also be tangencies at 
points where F is neither a maximum nor a minimum. We sum up our findings in 
a formal statement. 

THEOREM V. Suppose the functions F and G have continuous first partial 
derivatives throughout a certain region of space. Let the equation 
G(x, y, z) = k define a surface S, every point of which is in the interior of the 
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region , and suppose that the three partial derivatives G h G 2 , G 3 are never 
simultaneously zero at a point of S. Then a necessary condition for the 
values of F(x, y, z) on S to attain an extreme value ( either relative or 
absolute ) at a point of S is that F u F 2 , F 3 be proportional to Gi, G 2 , G 3 at 
that point. If C is the value of F at the point , and if the constant of 
proportionality is not zero , the geometrical meaning of the proportionality is 
that the surface S and the surface F(x , y, z) = C are tangent at the point in 
question . 

A fully detailed analytic proof of this theorem depends upon the theory of 
implicit functions, as developed in Chapter 8. The geometric arguments which 
precede the statement of the theorem do not constitute a rigorous proof, but 
they give a good basis for understanding the theorem; with adequate knowledge 
of implicit-function theory, the argument given here can be developed into a 
complete proof. 

Example 4. Consider the maximum and minimum values of F(x, y, z) = 
x 2 + y 2 + z 2 on the surface of the ellipsoid 

G(w) = g + £ + |5«l. 

Since F(x, y, z) is the square of the distance from (x, y, z) to the origin, it is 
clear that we are looking for the points on the ellipsoid at maximum and 
minimum distances from the center of the ellipsoid. The maximum occurs at the 
ends of the longest principal axis, namely at (±8, 0, 0). The minimum occurs at 
the ends of the shortest principal axis, namely at (0, 0, ±5). Consider the 
maximum point (8, 0, 0). The value of F at this point is 64, and the surface 
F(x, y, z) = 64 is a sphere. The sphere and the ellipsoid are tangent at (8, 0, 0), as 
asserted in the general theory. In this case the ratios G\ :G 2 :G 3 and F t : F 2 : F 3 at 
(8, 0, 0) are 1:0:0 and 16:0:0, respectively. 

This example brings out the fact that the tangency of the surfaces, or the 
proportionality of the two sets of ratios, is a ne cessary but not a sufficient 
condition for a maximum or minimum value of F ; for the condition of propor- 
tionality exists at the points (0, ±6, 0), which are the ends of the principal axis of 
intermediate length. But the value of F is neither a maximum nor a minimum at 
this point, as the student can readily see by considering the geometrical situa- 
tion. 

A similar geometrical interpretation can be given to the problem of extremal 
values for F(x, y) subject to a constraint G(x, y) = k. Here we have a curve 
defined by the constraint, and a one-parameter family of curves F(x, y) = C. At a 
point of extremal value of F the curve F(x, y)= C through the point will be 
tangent to the curve defined by the constraint. Suitable hypotheses must be 
made to rule out singular cases. 

Example 5. Find the maximum value of F(x, y) = xy subject to the constraint 
x 2 + y 2 = 8. 
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Here we have a fixed circle; the curves xy = C are hyperbolas. The hyper- 
bola xy = 4 is tangent to the circle at the points (2, 2) and (-2, -2) (see Fig. 45). 
These are the points at which the maximum value is attained. 

An advantage of the geometrical point of view is that we can often use it to 
conclude with certainty that F actually attains an absolute maximum or mini- 
mum subject to the given constraint. Suppose, for example, that the surface S 
forms a closed and bounded set of points in space. Then, since F is continuous 
at the points of S, we can infer that F actually attains an absolute maximum and 
an absolute minimum on S. This inference is based on a theorem analogous to 
Theorem III, §3.2, and Theorem II, §5.3. The general theorem covering this 
situation is Theorem IV, §17.3. 

Once we know that a maximum or minimum exists, we apply one of the 
methods which leads us to solve certain equations. If these equations have just a 
few solutions, as they often do in practice, it is an easy matter to tell which 
solutions give us the absolute maximum and which the absolute minimum 

6.8 / LAGRANGE’S METHOD 

In §6.7 we discussed extremal problems with constraints, and described two 
methods for handling such problems. We referred to these methods as (1) the 
method of direct elimination, and (2) the method of implicit functions. In this 
section We shall explain and illustrate a method known as Lagrange’s method; it 
was devised by the great 18th-century mathematician Joseph Louis Lagrange. Our 
explanations depend upon Theorem V and the discussions leading up to it in §6.7. 

In the case of the problem of extremal values of F(x, y, z) subject to a 
constraint G(x, y, z) = k, Lagrange’s method directs us to proceed as follows: 

Form the function 

u - F(x, y, z) + A G(x, y, z), (6.8-1) 

where X is a constant , as yet undetermined in value. Treat x, y, z as 

independent variables , and write down the conditions 

du _ „ dw_ n du _ n 
5x - U ’ ay ” U ’ dz ~ ’ 


( 6 . 8 - 2 ) 
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Solve these three equations along with the equation of constraint 

G(x, y,z)=k (6.8-3) 

to find the values of the four quantities x , y, z, A. More than one point ( x , y, z) 

may be found in this way , but among the points so found will be the points of 

extremal values of F. 

To understand the reason for the validity of Lagrange’s method, observe 
that the equations (6.8-2) are precisely 

Fj + AGi = 0, F 2 + AG2 = 0, F 3 + AG3 = 0. (6.8—4) 

Here A is a certain constant. These equations state, therefore, that at a point 
where they are satisfied, F u F 2 , and F 3 are proportional to Gi, G 2 , and G 2 . But 
we know from Theorem V that such proportionality occurs at the points of the 
surface G(x, y, z) = k where F has an extreme value. Thus the points of extreme 
value will be among those found by solving the four simultaneous equations 
(6.8-3) and (6.8-4). Thus Lagrange’s method is justified in this type of problem. 

The parameter A occurring in Lagrange’s method is called Lagrange’s 
multiplier. 

One of the great advantages of Lagrange’s method over the method of 
implicit functions or the method of direct elimination is that it enables us to 
avoid making a choice of independent variables. This is sometimes very im- 
portant; it permits the retention of symmetry in a problem where the variables 
enter symmetrically at the outset. 

Example 1. Find the dimensions of the box of largest volume which can be 
fitted inside the ellipsoid 



assuming that each edge of the box is parallel to a co-ordinate axis. 

Each of the eight corners of the box will lie on the ellipsoid. Let the corner 
in the first octant have co-ordinates (x, y, z); then the dimensions of the box are 
2x, 2y, 2z, and its volume is V = 8xyz. We wish to find the absolute maximum of 
V subject to the constraint (6.8-5). By the remarks at the end of §6.7 we know 
that an absolute maximum exists. Following Lagrange’s method, we set 



The equations (6.8-2) in this case are 

8yz + 2A p = 0, 

8zx + 2A p = 0, 

8xy + 2A ^2 = 0. 


(6.8-6) 
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After dividing by 2, let us multiply these three equations by x, y, z respectively, 
and add. In view of equation (6.8-5) the result is 


12xyz + A = 0, or A = - 12xyz. 


We now put this result back into the first of the equations (6.8-6), and obtain, 
after a slight simplification, 

yz(a 2 - 3x 2 ) = 0. 

By symmetry we obtain the two further equations 

zx(b 2 -3y 2 ) = 0, xy (c 2 - 3z 2 ) = 0. 


For maximum V it is clear that we want positive values of x, y, and z. The only 
possible solutions of (6.8-6) meeting these requirements are thus seen to be 


and 


x 


a b c 

— 7=»y = — 7=>z = — 

V3 V3 V3 


A = - 12xyz = 7= abc. 

V 3 


The box of maximum volume therefore has dimensions 

y 2b 2c 

V3 V3 V3' 

Lagrange’s method may be extended to the case of several constraints. The 
number of variables is immaterial. An undetermined multiplier is introduced 
corresponding to each constraint. We shall illustrate the extended method 
without formal proof. 

Example 2 . Find the semiaxes of the ellipse in which the plane 

lx + my + nz = 0 (6.8-7) 

cuts the ellipsoid 



Since the plane passes through the origin, it is clear that the ends of the 
semiaxes of the ellipse are at the points (x, y, z) where the function 

F(x, y,z) = x 2 +y 2 +z 2 

is a maximum or a minimum subject to the side-conditions (6.8-7) and (6.8-8). 
We set up the expression 

u = x 2 + y 2 + z 2 + Ai(/x + my + nz) + A 2 (^ 2 + + 

using two Lagrange multipliers, A,, A 2 . The method of Lagrange leads us to write 
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xl . • du du dll 

the equations - = - = - = 0, or 


2x + A,l + 2A 2 | 2 = 0, 

2y + \\m + 2A 2 -r^ — 0, (6.8—9) 


2z + Airt + 2A 2 ~ 5 — 0. 

tr 


Theoretically we may solve these three equations together with the two 
equations of constraint to find the values of x , y , 2, Aj, A 2 . Actually, the solution 
presents great difficulties. However, since we are merely interested in the lengths 
of the semiaxes of the ellipse, it will suffice to solve for the value of F(x, y, z). 
Let us write p 2 =F(x, y, z), so that p is the length of the semimajor or 
semiminor axis according as F has a maximum or a minimum. 

If we multiply equations (6.8-9) by x, y, z respectively, and add, we obtain 

2(x 2 + y 2 + z 2 ) + Ai(Jx + my + nz) + 2A 2^2 + p + ~2^J = 0* 

In view of (6.8-7) and (6.8-8), this becomes 

2p 2 + 2A 2 = 0, or A 2 = - p 2 . 

Returning to (6.8-9) with this result, we have 

2(a 2 -p 2 )x + Aia 2 / = 0, (6.8-10) 


and similar equations involving y and z. Except in special cases the situation will 
be such that x, y, z are all different from zero at the end of a semiaxis; and 
furthermore, p 2 will be different from a 2 , b 2 , and c 2 . Thus in equation (6.8-10) 
we may assume a 2 -p 2 ^ 0 and Ai ^ 0 except in special cases. Thus 


x = X\ 


a 2 l _ x b 2 m 
2 (p 2 -a 2 ) ,y ~ Al 2 (p 2 -b 2 )’ Z 


2 

c n 


1 2 (p 2 - c 2 ) 


If we substitute these values in (6.8-7) and cancel the factor Ai, we obtain the 
equation. 


a 2 l 2 , b 2 m 2 , c 2 n 2 _ 
p 2 -a 2 p 2 -b 2 p 2 -c 2 


(6.8-11) 


When cleared of fractions this equation is a quadratic in p 2 as an unknown. The 
two roots of the equation thus determine the lengths of the semiaxes of the 
ellipse. For some consideration of the exceptional special cases see Exercise 18. 

The foregoing problem illustrates very clearly the merits of Lagrange’s 
method in the preservation of symmetry. 
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EXERCISES 

1. A rectangular box lies in the first octant, with one corner at the origin and the 
diagonally opposite corner on the plane (x/a) + (y/b) + (z/c) = l(a, b, c > 0). Find the 
maximum possible volume of the box. 

2. Apply Lagrange’s method in finding the extreme values of x 2 + y 2 + z 2 subject to 
the constraint (x 2 /a 2 ) + (y 2 /b 2 ) + (z 2 /c 2 ) = 1, where a > b > c > 0. 

3. A triangle is such that the product of the sines of its angles is a maximum. Show 
that the triangle is equilateral. 

4. Find the maximum value of xyz/(a 3 x + b 3 y + c 3 z) subject to the conditions 
xyz = A 3 , x, y, z > 0 (a, b, c, A all > 0). 

5. Find the minimum of x + y + z subject to the conditions (a/x) + (b/y) + (c/z) = 1, 
x, y, z > 0 (a, b , c and x, y, z all > 0). 

6. The perimeter of a triangle has a prescribed value 2s. Determine the sides of the 
triangle so as to maximize the area. 

X V 

7. Let D = . Find the maximum value of D 2 subject to the conditions 

u v 

x 2 +y 2 =fl 2 , w 2 +u 2 =b 2 , where a > 0, b>0. Solve the problem in two ways: (1) by 
Lagrange’s method, and (2) by setting x = a cos 0, y = a sin 0, u = b cos 0, v = b sin </>, 
and using 0, 4> as independent variables. 

8. Find the minimum value of x 3 + y 3 + z 3 for positive x, y, and z, if it is required 
that ax + by + cz = 1, where a, b, c are positive constants. 

9. Suppose a, b, c are positive constants. If x, y, z are positive and ayz + bzx + 
cxy = 3abc, show that xyz = abc. 

10. Solve the following problems by Lagrange’s method: (a) Example 1, 

§6.3; (b) Example 3, §6.3; (c) Example 2, §6.7; (d) Example 3, §6.7. 

11. Let the lengths of the sides of a fixed triangle of area A be a, b, c. From an 
interior point O draw the perpendiculars to the sides of the triangle, and let their lengths 
be x, y, z corresponding to a, b, c. If now a parallelepiped is constructed with edges x, y, z 
and volume V, show that V is a maximum when the lines from O to the vertices of the 
triangle divide it into three equal areas. What is the maximum value of V? 

12. For the situation described in Exercise 11, show that 

(x 2 + y 2 + z 2 )(a 2 + b 2 + c 2 ) ^ 4A 2 . 

13. Find the minimum distance from (0,0, c) to the cone z 2 = (x 2 /a 2 ) + (y 2 /b 2 ). 
Assume c > 0 and 0 < b < a. 

14. Given the ellipse b 2 x 2 +a 2 y 2 = a 2 b 2 , find the points (x, y) on the ellipse so that 
the line normal to the ellipse at (x, y) passes as far as possible from 
the origin. What is this greatest distance from the origin to a normal? 

15. Find, by Lagrange’s method, an extreme value of xyz 
subject to the conditions (1/x) + (1/y) + (1/z) = c, x, y, z all >0 (c a 
positive constant). Is this value an absolute maximum, an absolute 
minimum, or neither? 

16. A particle is to travel from A to P and thence to B, by a 
broken line as indicated in Fig, 46. Velocity from A to P is di, 
and from P to B is v 2 . Show by Lagrange’s method that when the 
time of travel is least, (sin 0,)/(sin 0 2 ) = vjv 2 . 



Fig . 46. 
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17. Let p 2 = x 2 + y 2 + z 2 . Consider the extreme values of p 2 , subject to the conditions 
Ix + my + nz = 0, (x 2 + y 2 + z 2 ) 2 = a 2 x 2 4- b 2 y 2 + c 2 z 2 , where a > b > c > 0. The minimum of 
p 2 is obviously zero. Show that, barring exceptional cases, the maximum is given by one 
of the roots of the equation 


l 2 . m 2 n 2 A 


Observe from the second constraint that the maximum of p 2 is less than a 2 . 

18. This exercise is devoted to one of the exceptional cases which can arise in 
connection with the derivation of (6.8-11) in Example 2. Suppose a > b > 0 and b = c, so 
that the ellipsoid has circular cross sections in planes perpendicular to the x-axis. 
Suppose also that /^0. Show that, in this case, the minimum p 2 is b 2 and that the 
maximum p 2 is the sole root of (6.8-11). 

19. Find the maximum and minimum values of the squared distance from the origin 
to the first octant portion of the curve in which the plane x + y + z = 12 meets the surface 
xyz = 54. 

20. Find the minimum of x ? + • • • + x 2 subject to the constraint fliXi + • • • + a„x n = 1, 
where a?+ • • ■ + a 2 > 0. 

21. Find the maximum of (S,"=i a.x,) 2 subject to the condition 2r=iX?= 1. Assume 
that at least one of the a’s ^ 0. 

22. If P(x, y) is a point on the ellipse 256x 2 + 81y 2 = 2304 and Q(m, v ) is a point on 
the line 4x + 3y = 24, find the minimum value of the distance PQ. 

23. Find the minimum distance between the curves x 2 + y 2 = 1, x 2 y = 16. 

24. Let y = f(x) be the equation of a smooth curve, and let P(a, b) be a point not on 
the curve. Let D be the distance from P to a variable point Q of the curve. Prove by 
Lagrange’s method that PQ is normal to the curve when D is a minimum or a maximum. 

25. (a) Use the result of Exercise 24 to give a simple solution of Exercise 

23. (b) Proceed as directed in (a) to solve Exercise 22. 

26. If P(x 0 , yo, z 0 ) is a fixed point outside the ellipsoid (x 2 /a 2 ) + (y 2 /b 2 ) + (z 2 /c 2 ) = 1, 
and Q is on the ellipsoid, show that the line PQ is normal to the ellipsoid when the 
distance PQ is a minimum. 

27. Let G(x, y, z) = 0 define a surface S with a tangent plane at each point, and let 
P(fl, b, c) be a point not on S. Show that, if Q is a point on S, the line PQ is normal to S 
whenever the distance PQ attains a relative extreme. From this result prove that, if A and 
B are variable points on two smooth nonintersecting surfaces, then, when A and B are 
located so as to make the distance between them an absolute minimum, the line AB is 
normal to both surfaces. 

28. Show that the maximum value of x 2 y 2 z 2 subject to x 2 +y 2 +z 2 =P 2 is (R 2 /3) 3 . 
Deduce from this that 

OcVz y» a « , + y* + z 2 


for all values of x, y, z. Generalize this result to n variables, and so obtain a proof of the 
inequality 


(a.a 2 • • • a.) 1 '" § 


+ a„ 


n 
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between the geometric mean and arithmetic mean of n positive numbers. What condition 
is both necessary and sufficient for equality? 

29. In the case of two positive numbers, Exercise 28 gives 

u m v m ^hu +lv. 

Derive from this the fact that 


n 



ciibi 




1/2 


if the a’s and b's are any real numbers. This is known as Cauchy's inequality. Hint: Let 

2 


a? . b. 
u — —r~ and v = — 

A B 


where 

A = 2 af and B = 2 bf. 


What condition is both necessary and sufficient for equality? 

30. If do, a 2 , . . . , an are nonnegative numbers whose sum is 1, then aiXi + a 2 x 2 + 
• • • + a„x„ is called a weighted arithmetic mean of the x’s. If the x’s are positive, then one 
can define the corresponding weighted geometric mean to be xT^xf 2 • * * x£«. Show that if 
all the x’s are positive, the weighted geometric mean is always less than or equal to the 
corresponding weighted arithmetic mean. Hint: Maximize x“'xp • • • x“« subject to the 
constraint «iXi + <* 2 x 2 + • ■ • + a„x n = C. This will show that for all positive x’s having a 
given weighted arithmetic mean C, the weighted geometric mean is less than or equal to 
C. 

Notice that this result contains that of Exercise 28 as a special case. 

31. If n, v, p, and q are positive and (l/p) + (l/q) = 1, then Exercise 30 gives 

u' la v Va S-U+-V. 

P <3 


Deduce from this that under the above restrictions on p and q. 




i=i 


(iw-rciN-r. 


where the a’s and b y s are any real numbers. This is known as Holder's inequality. Notice 
that it is a generalization of Cauchy’s inequality. 

Hint: Let 


m 


and v = 





2 N 

i= 1 


<? 


32. Use Holder’s inequality (Exercise 30) to deduce the inequality of Minkowski: 


( n \ 1/p / n \ 1 Ip I n \ 

2l*i + y<l p ) ^ (g M p ) +{2M P ) 


i/p 

, p > 1. 


Write |xj + y £ | p = |x« + y 4 ||xi + yi| p/q ^ |xi||x t + y;| p/q + |yi||x ( + yi| p/q , and then apply Holder’s 
inequality, once with a £ = |x f j and once with a t = |y ( |. Observe that Minkowski’s inequality 
obviously holds when p = 1. 
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6.9 / QUADRATIC FORMS 


Another application of Lagrange multipliers is to the study of central conics and 
their generalizations. A central conic is simply one that has a center — that is, an 
ellipse or a hyperbola. The parabola, which has no center, will not be treated in 
this approach. Our problem is that of recognizing and describing a central conic 
from its equation. Since this problem for the parabola is comparatively easy, its 
omission from our consideration here is not serious. When we describe a central 
conic in terms of a co-ordinate system whose origin coincides with the center, the 
equation looks like 

Ax 2 + 2Bxy + Cy 2 = k. (6.9-1) 


The left-hand side, 


Q(x, y) = Ax 2 + 2Bxy + Cy 2 


is called a quadratic form. We can define a quadratic form without reference to 
conics by saying that it is a homogeneous polynomial of second degree in two or 
more variables. Homogeneity means the same here as in §6.53. A quadratic form 
in three variables looks like 


Q(x, y, z) = Ax 2 + By 2 + Cz 2 +2Dxy + 2 Exz + 2Fyz, (6.9-2) 


and so on for any number of variables. We shall be mainly concerned with two 
variables, but quadratic forms in three variables will be given some attention. 

Beginning with the simplest case, the problem is to plot the graph of the 
equation (6.9-1). This is easy if B happens to be zero and (6.9-1) reduces to 

Ax 2 + Cy 2 = k. 

If A and C are positive we have an ellipse if k is positive, the point (0, 0) if 
k = 0, and no graph at all if k < 0. If A and C are of opposite signs, we have a 
hyperbola if k^ 0 and two straight lines if k = 0. The case where A = 0 or C = 0 
is trivial. 

The ease with which these special cases can be treated suggests that when 
confronted by (6.9-1) we try to find some other co-ordinate system in terms of 
which the coefficient B is reduced to zero. If we hold the origin fixed and rotate 
the axes through an angle 6, then the co-ordinates in the new system — say the x', 
y '-system — are related to those in the x, y-system by the equations 


x = x' cos 6 - y' sin 0 , 
y = x' sin 0 + y' cos 0. 


(6.9-3) 


Substituting for x and y in (6.9-1) the new equation of the graph can be written 
(after some simplification) in the form 

A'(x') 2 + 2B'x'y' + C'(y') 2 = k 

where A', B' and C’ are functions of A, B , C, and 0. 

At this point in an introductory analytic geometry course, two very im- 
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portant results are established. The first is that by proper choice of 6, B' can be 
made equal to zero and the equation thereby reduced to canonical form, 

A'(x') 2 + C'(y') 2 = k. (6.9-4) 


The second is a far-reaching and surprising identity relating the original 
coefficients and the new ones. Although A', B', and C' won’t look at all like A, 
B, and C, it is nevertheless true that 


A’C'-(B') 2 = AC-B 2 


A B 
B C 


(6.9-5) 


no matter what angle of rotation is used. This result is called the invariance of 
the discriminant because it says that AC 2 -B 2 — the discriminant of the quadra- 
tic form — remains unchanged under all rotations of the co-ordinate axes. These 
facts from analytic geometry play an essential role in what is to follow. 

We recall from our elementary courses that finding the proper angle of 
rotation and then going through the steps of actually reducing a quadratic form 
to canonical form is rather tedious. By using Lagrange multipliers we are now 
going to develop a way of obtaining the canonical form without going through so 
much computation. 

Suppose that we are given equation (6.9-1) and 
asked to describe its graph. We can begin by observing 
that since quadratic forms are continuous, and the 
unit circle is closed and bounded, it follows from 
Theorem II, §5.3 that Q(x, y), subject to the constraint 
x 2 +y 2 = 1, has both a maximum and minimum. Let 
(jci, y0 be the co-ordinates in the x , y-system of a point 
where Q(x, y) takes on one of these extreme values. 

We could then rotate the co-ordinate axes to where the 
positive axis of abscissas passed through the point, 
as in Fig. 47. Then in the rotated co-ordinate system 
the new co-ordinates of this point would be x ' = 1 , y ' = 0. The quadratic form Q(x, y ) 
would be transformed into 

Q'(x', y') = A'(*') 2 + 2B'x'y' + C'(y') 2 . 


V 



The equation of the unit circle would be the same in the rotated co-ordinate 
system as in the original system, (x') 2 + (y') 2 = 1 , and we would know that the 
function Q'(x', y') would have an extreme value, subject to the constraint 
(x') 2 + (y') 2 =1 at x' = 1, y' = 0. In terms of the associated Lagrange function 

L'(x', y') = A'(x') 2 + 2B'x'y' + C'(y') 2 - A[(x') 2 + (y') 2 ] 


this means that the two equations 

\^ = (A'-\)x'+B'y' = 0 


||^V=B'x' + (C'-A)y' = 0 


(6.9-6) 
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must be satisfied by x' = 1, y' = 0. Substituting these two values into the second 
equation reveals that B' = 0 and therefore that we have hit upon a rotation which 
reduces Q(x, y) to the canonical form 

y ') = A'(x') 2 + C'(y') 2 , 


and the transformed equation of the conic is the canonical form (6.9-4). 

It is the invariance of the discriminant which now allows us to find A ' and C' 
easily. Notice that the Lagrange function, L(x, y) = Ax 2 + 2Bxy + 
Cy 2 -A(x 2 +y 2 ) associated with Q(x, y) is itself a quadratic form having as 
its discriminant (A - A )(C - A) - B 2 . This can be written as the determinant 

A- A B 
B C - A * 

We have just seen that the rotation which transforms the co-ordinates (xi, y,) into 
(1,0) transforms L(x , y) into 

L’(x\ y') = A'(*') 2 + C\yf- A[(x') 2 + (y') 2 l 


which is a quadratic form with discriminant (A' - A)(C'- A). By (6.9-5) we have 
that 


A- A 

B 


B 

C - A 


= (A' - A)(C' — A). 


The discriminant of L(x, y) is clearly a second-degree polynomial in A whose 
leading coefficient is 1. Each such polynomial has the unique factorization 
(A - r } )( A - r 2 ) where r x and r 2 are the zeros of the polynomial. From this it 
follows that the coefficients A' and C' in the canonical form of the quadratic 
form Q(x, y) are simply the two roots of the quadratic equation 


A- A 
B 


B 

C-A 


= 0 . 


(6.9-7) 


These two roots can of course be found as soon as equation (6.9-1) is given, and 
the equation of the conic reduced immediately to canonical form, without going 
through again each time our fairly lengthy argument which justifies the process. 

Since there are two roots, the reader might wonder which to call A' and 
which C’. The answer is that it doesn’t matter— there are two equally good 
canonical forms to which every quadratic form in two variables (and every 
central conic) can be reduced. This ambiguity was presaged when we began by 
picking a point (x u yO where one of the extreme values of Q(x, y) on the unit 
circle was taken on. For our purposes at that time — the elimination of the 
xy-term — it made no difference whether this point gave a maximum or a 
minimum value to Q. The only difference it makes now is that if this point, 
whose co-ordinates in the primed system are x' = 1, y / = 0, is a point of maxi- 
mum, then A' is the larger of the two roots. If this point (x t , y x ) to which we rotate 
the positive axis of abscissas gives a minimum, then A' is the smaller of the two 
roots. We see this from the obvious fact that if a > /3, then the maximum value 
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of ax 2 + fiy 2 on the unit circle is a , and this value is taken on at (1, 0) — as well as 
at (-1,0) of course. If a < ft then a is the minimum value, and it is taken on at 
(1,0) and (-1,0). The following example shows how easy it is to find the 
canonical form of the equation of a central conic by this method. 

Example 1. Find the dimensions of the ellipse 73x 2 + 72xy + 52y 2 = 100. 

Here A = 73, B = 36, C = 52. 

The equation (6.9-7) becomes 

A 2 - 125A + 2500 = 0, 

with roots A = 25, A = 100. This means, therefore, that the equation of the ellipse 
can be put in the form 

25x l2 + 100y' 2 = 100, or ^+y' 2 = 1, 

by a rotation of axes. The principal semiaxes of the ellipse are therefore 2 and 1. 

The generalization to the case of three variables is now a rather easy matter. 
The essential statement of the generalization is this: Given the quadratic form 
Q(x, y, z) as in (6.9-2), it is possible to make a rotation of the co-ordinate axes so 
that Q(x , y, z) becomes 

G(x', y', z’) = A t x' 2 + A 2 y' 2 + A 3 z' 2 , (6.9-8) 

and the A’s are the roots of the equation 

A- A D E 
D B -A F 
E EC- A 

A proof of this statement is indicated in Exercise 14. If one wishes, matters may 
be arranged so that Ai ^ A 2 ^ A 3 . 

Example 2 . Reduce the quadratic form xy + xz - yz to the form (6.9-8) and 
so identify the surface xy T xz** yz = - 1. 

The equation (6.9-9) becomes 



in this case. On expansion and simplification this becomes 

-J+ja-a 3 = o, 

the roots of which are found to be 2 , 2 , ~ 1. Thus, after a suitable rotation, our 
equation becomes 

|x ,2 + b' 2 -2' 2 =- 1- 

This defines a hyperboloid of two sheets with circular cross sections per- 
pendicular to the z'-axis (for |z'| > 1). 


= 0 . 


(6.9-9) 
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A more symmetrical notation for quadratic forms is in some ways extremely 
desirable. If we write x b x 2 , x 3 instead of x, y, z, a quadratic form in x u x 2 , x 3 will 
have terms of all possible types xpc,-, with i and j assuming the values 1, 2, 3. If 
we write a (J for the coefficient of XjXj, the quadratic form will have the ap- 
pearance 

F(x i, x 2 , x 3 ) = anxj + a, 2 x I x 2 + a l3 x { x 3 

+ a 21 x 2 xi + a 22 x^ +a 23 x 2 x 3 (6.9-10) 

+ 031 * 3*1 + a 32 x 3 x 2 + a 33 x 3 . 

Since x y x 2 = x 2 Xi, we agree to make ai 2 = a 2 i = half the total coefficient of XjX 2 ; 
similarly for a t3 and a 23 . The discriminant of the form is, by definition, the 
determinant 


a u 

012 

013 

dir 

022 

023 

a 3 i 

032 

033 


The determinant appearing in (6.9-9) is seen to be the discriminant of the form 
Q(x, y, z) - A(x 2 + y 2 + z 2 ). 


EXERCISES 

1. Find the dimensions of the ellipse 41x 2 - 24xy + 34y 2 = 25. 

2 . Show that F(x, y) = x 2 - 4xy - 2y 2 = 1 is the equation of a hyperbola. Find a 
point on the unit circle at which F(x, y) is a maximum. Then draw the xy-axes, the axes 
of symmetry of the hyperbola, and the hyperbola itself. 

3. Find the maximum and minimum values of F(x, y) = 9x 2 -6xy + y 2 on the circle 
x 2 +y 2 = 1. If Ai is the maximum value, show that F(x, y) = Ai is the equation of two 
lines, each tangent to the circle at a point where the maximum occurs. 

4 . Let F(x, y, z) = y 2 + z 2 - V2xy + V2xz + 2yz. Find the maximum and minimum 
values of this function on the surface of the unit sphere. What does (6.9-8) become in this 
case? 

5. Reduce each of the following quadratic forms to the standard form (6.9-8); 

(a) y 2 + z 2 - V2xy - V2 xz + 2yz. 

(b) 13x 2 + 13y 2 + 10z 2 + 8xy - 4xz - 4yz. 

6. (a) Put F(x, y, z) = xy + yz + zx in the form (6.9-8). 

(b) What is the maximum value of F on the unit sphere? 

(c) At what points does it occur? 

7. Determine the signs of Ai, A 2 , A 3 for each of the following quadratic forms, and 
so identify the type of each quadric surface. You need not find the actual value of the A’s. 

(a) x 2 + xy + yz = 1. 

(b) yz + xz - xy = 1. 

(c) x 2 + 2y 2 + 3z 2 - 2xy - 2yz = 2. 

8. Find maximum and minimum values of 17x 2 — 30xy + 17y 2 when 5x 2 — 6xy + 
5y 2 = 4. 
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9. Find the minimum value of x 2 + y 2 + xy subject to the condition 2x 2 + 6xy + 2y 2 = 
9. 

10. Suppose that the locus of Ax 2 + 2Bxy + Cy 2 = 1 is an ellipse. Consider the 
problem of locating the maximum and minimum values of x 2 + y 2 on the ellipse. Apply 
Lagrange’s method, starting with the expression x 2 + y 2 - A(Ax 2 + 2Bxy + Cy 2 ). Show that, 
in the resulting equations for locating the extreme values, A must be a root of the 
equation 


1-AA -BA 
-BA 1 - CA 


and that the roots of this equation are the extreme values of x 2 + y 2 . What is the relation 
between these roots and the semiaxes of the ellipse? 

11. If x, y, A are solutions of the simultaneous equations 


(A- A)x + By =0, 

x + y = 1, 

Bx + (C-A)y =0, 

show that Ax 2 + 2Bxy + Cy 2 = A. Hence show that, in the notation used in the text, A, and 
A 2 are respectively the maximum and minimum values of F(x, y) when x 2 + y 2 = 1. 

12. (a) Assume that F(x, y) = k(k >0) defines a family of ellipses. Let Ai and A 2 be 
the roots of (6.9-7), with Ai ^ A 2 . Show that the ellipse F(x, y) = Ai is externally tangent to 
the circle x 2 + y 2 = 1 at the ends of the minor axis of the ellipse, and that the ellipse 
F(x, y) = A 2 is internally tangent to the circle at the ends of the major axis of the ellipse. 
Draw a figure showing these two ellipses, the circle, and the x'y'-axes. What can you infer 
from the figure about maximum and minimum values of F(x, y) on the circle? 

(b) Assume that F(x, y) = k defines a family of hyperbolas. If Ai and A 2 are the roots of 
(6.9-7) with Ai ^ A 2 , explain why Ai >0 and A 2 <0. Draw a figure showing the x'y'-axes, 
the unit circle, the hyperbolas F(x, y) = Ai, F(x, y) = A 2 , and other members of the family. 

13. Prove (6.9-5) by actually substituting (6.9-3) in (6.9-1) and computing A', B', C'. 
Also prove that A'+ C' = A + C. 

14. (a) Suppose that Q(x, y, z) in (6.9-2) becomes a new form G(x', y', z') with 
coefficients A', B', . . . , F' when we shift to new axes x'y'z' obtained by a rotation from 
the xyz-system. Write the equations which correspond to (6.9-6) for the problem of 
making G(x', y', z') a maximum on the unit sphere. If the new axes are chosen so that this 
maximum occurs at x' = 1, y' = z' = 0, show that D'=E' = 0, and that G(x',y', z') = 
A,x' 2 + B'y' 2 +C'z' 2 +2F'y'z', where Ai is the maximum value of G on the unit sphere. 
Now explain how it is possible to choose a new set of axes x", y", z", by a rotation from 
the x'y'z '-system, rotating about the x'-axis, in such a way that the form becomes 
Aix" 2 + A 2 y" 2 -t- A 3 z" 2 , where A 2 and A 3 are respectively the maximum and minimum values 
of G subject to the two constraints x' = 0, y' : + z' 2 = 1. 

(b) It may be proved algebraically that the discriminant of a quadratic form Q(x, y, z) is 
equal to the discriminant of the new form G(x', y', z') which is obtained from Q(x, y, z) by 
a rotation of axes. Use this fact to prove that the numbers Ai, A 2 , A 3 described in part (a) 
of this problem are the roots of the cubic equation (6.9-9). 


MISCELLANEOUS EXERCISES 

1. Choose a and b so that fJ(Vx-a- bxf dx is as small as possible; a + bx is 
then called a “least-square” approximation to Vx in the interval [0,1]. 
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2. If </>(u, i>) = f(x , y ), where u = y 2 - x 2 , v = y 2 + x 2 , show that 

1 d 2 f = d 2 <j> d 2 (f> 

4xy dx dy dv 2 du 2 


3. (a) Show by a diagram the part of the xy -plane in which f(x , y) = 
(a - x)(a - y)(x + y - a) ^ 0. Assume a>0. (b) Find all points of the plane at which 
/ i(x, y) = / 2 (x, y). (c) Which of these points yield relative maxima or minima of /( x, y), 

and which do not? (d) Does f have any absolute extrema? 


4. Let S = 32V3^- + — j + xy(sec 6 — 3 tan 0). Find the minimum value of S in the 


region x >0, y>0, 0^0 < tt/ 2 of xy0-space. How do you know that the minimum does 
not occur when 0=0? 


5. Find the point of occurrence of the maximum value of x 2 yz 3 on the part of the 
plane x + y + z = 24 that lies in the first octant. 

6. Find the maximum and minimum values of x 2 +y 2 + z 2 subject to the two 
conditions (x 2 /25) + (y 2 /25) + (z 2 /9) = 1, x + y + 2z = 0. 

7. Find the ratios y/x and xfz to make x 2 + 2xy +xVjt 2 + z 2 a minimum when 
3x 2 y 4- x 2 z has a prescribed positive value (x, y, z all > 0). 

8. Find the maximum and minimum values of z subject to the conditions x 2 + y 2 + 
z 2 = 25, (* - 2V6) 2 + (y - 2V3) 2 + (z - 6) 2 = 13. 

9. Suppose a, b, c , p given all positive, and pel. Find the maximum of ax p + 
by p + cz p subject to the conditions jc + y+ z= l,x, y, z>0. 

10. Consider S = 2(xy + yz + zx), subject to the conditions ax + by + cz = 2A and x, 
y, z are all positive, where A > 0 and 0<a<b + c, 0<b<c + a, 0 < c < a + b. Show that 
the maximum possible value of S is 8A 2 [2(ab + be + ca) - (a 2 + b 2 + c 2 )] -1 . 

11. Find the absolute maximum and minimum values of 


/(x, y) = 3x 2 - 2(y + l)x + 3y — 1 


in the square 0^x^l,0^y^l. 

12. (a) Find the minimum distance from the point (3, 4, 15) to the cone 4z 2 = x 2 + y 2 . 

(b) Find the minimum distance from the point (9, 12, -5) to the cone 4z 2 = x 2 + y 2 . 

(c) Are there any relative extrema in addition to the absolute minimum in part (a)? in part 

(b)? 
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7 / PRELIMINARY REMARKS 

This chapter is not primarily concerned with the technique of partial differen- 
tiation, but with statements and proofs of some of the important theorems about 
differentials and partial differentiation. We have separated the material of the 
chapter from that of Chapter 6 for a number of reasons. In studying the subject 
of partial differentiation the student needs first of all to get acquainted with the 
new ideas which the subject presents to him. He needs to assimilate these ideas 
through the medium of examples and problems. He will want a reasoned 
development of the subject, but he will be more interested in mastery of 
technique and appreciation of some applications than in the details of the longer 
proofs, particularly as regards the proofs of theorems which he is quite willing to 
take for granted in the early stages. The theorems of §§7.1, 7.2 and 7.3 have been 
referred to in Chapter 6. The student needs to know these theorems, but he can 
very well go through Chapter 6 without studying their proofs. 

Sections 7.4 and 7.5 are on a somewhat different footing. Every student who 
uses advanced calculus is likely to have need of Taylor’s formula for a function 
of several variables. One meets references to the formula frequently in the liter- 
ature of applied mathematics and in various branches of higher analysis. The law 
of the mean is merely a special case of Taylor’s formula. We have put this mater- 
ial here rather than in Chapter 6 because its applications are not so immediate. 

The final section of the chapter deals with tests for maximum or minimum 
values of a function in terms of the second partial derivatives. These tests are of 
great importance — both practical and theoretical. 

Within the limits of time of an ordinary year course in advanced calculus the 
instructor may wish to make only a selection from Chapter 7 with such a degree 
of emphasis on the proofs as he or she sees fit. The various sections are 
practically independent of each other, except that §§7.4 and 7.5 go together. 

The definition of differentiability given in §6.4 can be reformulated slightly in 
equivalent ways that we shall find useful. We shall give the reformulations for a 
function of n variables. We use the notation 

||h|| = (hH • • • + fry 2 *7-1) 

introduced in connection with (jfc.4-17). 
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A function f of n variables that is defined in a neighborhood of the point 
(xj, . . x„) is differentiable at (jcj, . . ., x„) if and only if (1) all of the first partial 
derivatives of / exist at (x u . . x„) and (2) we can write the formula 

n 

f(x { + h u . . x n + h„)-f(x u . . x„) = 2 fi(x i, - . x„) hi + e||h||, |7~2) 

i = 1 

where e is a variable quantity depending on h u ...,h n , with value 0 when 
h\ = • • • = h n = 0, and such that e -» 0 when ||h|| -» 0. 

This formulation is obviously equivalent to that stated in connection with 
tf.4-17), because the A,’s in {£.4-17) must of necessity be given by A\ = 
/i(jci jc.) [ see (6.4-18)]. 

Another equivalent formulation is obtained if we replace ||h|| in (7-2) by ||h||*, 
where 

= N + + K7-3) 

The reason for the equivalence is that 

VnllhH, ^7-4) 

or, in more explicit form. 

(hi + • • • + h 2 n y 12 < |h,i + • • ■ + im ^ Vn (h? + • . • + hiy 12 v-m 

The first of these inequalities is easily proved, for if we calculate 
(\h\\ + • • * + |hn|) 2 , we obtain all of the terms hi . . hi plus additional terms, none 
of them negative. The second inequality in (7-5) is a consequence of Cauchy’s in- 
equality, given inpxercise 29, §6.8* (In that inequality put a, = |h,-| and h, - 1.) 

Because of the foregoing we can see that it does not matter whether we write 
e||h|| or c*||fi||* in (7.2) (with €*^0 as ||h||*-»0). For we see by (7.4) that ||h||— > 0 is 
equivalent to ||h||*->0, and if €||h|| = e*||h||*, we see that e-^0 is equivalent to 
€ sj; — > 0 because e ^ n and e + < e (with e = e* = 0 if all the h)s are 0). 

7.1 / SUFFICIENT CONDITIONS FOR DIFFERENTIABILITY 

The concept of differentiability for a function of several variables was defined in 
§6.4. To be differentiable at a given point a function must have first partial 
derivatives at that point. But this alone is not enough. We may have a function 
f(x, y) such that / t (0, 0) and / 2 (0,0) exist, and yet such that / is not differentiable 
at (0,0); for an illustration see Exercise 7, §6.4. The following theorem deals 
with sufficient conditions for differentiability: 

THEOREM I. Suppose the function /( jc, y) is defined in some neighborhood of 

df 

the point (a, b). Suppose one of the partial derivatives, say — , exists at each 

point of the neighborhood and is continuous at (a, b), while the other partial 
derivative is defined at least at the point (a, b). Then f is differentiable at 
(a, b). 
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Proof. We shall use one of the formulations of differentiability from the 
preceding section. We shall show that we can write 

f(a + h,b + k)- f(a, b) = fx(a, b)h+ f 2 (a , b)k + c(\h\ + \k\), «7. RI) 

where e^O as (h, fc)-> (0, 0). We work w ith small values of h and k f and we 
express the left side of (7.1-1) as the sum of two^Iff^encesi ooW^ ^ 

f(a + h, b + k) - /(a, b) = f(a + h, b + k) - f(a , b + k) + /(a, b + k) - /(a, b). 

*(7.1-2) 

Next we apply the law of the mean (Theorem IV, §1.2) to f(x, b + k) as a 
function of x alone. The result is that there is a point between x - a and 
x = a + h, which we may denote by x = a + Oh, where 0 < 0 < 1, such that 

f(a + h, b + k)- f(a , b + k) = /i(a + Oh, b + k)h. 

Because f\ is assumed to be continuous at (a, b), we can write 

fi(a + 0h, b + k) = /i(a, b) + e l5 

where ei->0 as h^>0 and k~> 0. Thus we have 


f(a + h,b + k)~ f(a, b + k) = f { (a, b)h + e x h. 


and we can put the expression on the right of the equality sign here in place of 
the first difference on the right in (7.1-2). For the other difference on the right in 
(7.1-2) we use the definition of f 2 (a, b) as the limit of a quotient to write (when 
k^O) 


f(a,b + k)-f(a,b) 
k 


= h (a, b) + € 2 , 


where € 2 -»0 as k-+ 0. We define e 2 to be 0 if k = 0. This permits us to replace the 
second difference on the right in (7.1-2) by f 2 (a, b)k + e 2 k. The result is that we 
have 


f(a + h, b + k) - f(a, b) = f x (a, b)h + f 2 (a, b)k + e x h + e 2 k. 
To bring this equation into the form of (7.1-1) we define 


ei h + e 2 k 

i*i+ i*i 


if |h| + |k|*0 


and e = 0 if |h| + \k\ = 0. Then, if |h| + \k\ ^ 0, 


e =s tj 


M 


+ N 


M 


|fc| + |k| ' l '\h\ + \k\ 


— kll + I «2, 


because, clearly, \h \ < \h\ + \k\ and \k\ < \h\ + |lc|. But now it is evident that e 
when |h| + |k|->0, because 6i-^0 and e 2 ^0. The proof is now complete. 


0 


It will be observed that the conditions of the theorem are not symmetrical as 
regards x and y. We might equally well have assumed the mere existence of 
/i(a, b), while assuming the continuity of f 2 (x, y) at (a, b ). In general, for a 
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function of more than two variables, we assume the mere existence of one of the 
first partial derivatives, and the continuity of the other first partial derivatives. 
We may then conclude that the function is ditferentiable. The proof is similar to 
that of Theorem I. For most practical purposes the following statement is 
sufficient: 

THEOREM II. A function of several variables is differentiable at a point if the 
function and all its first partial derivatives are defined in some neighborhood 
of the point and if these derivatives are continuous at the point. 


EXERCISES 

1. Let fix, y) = (x 4 + y 4 )/(x 2 + y 2 ) if x 2 + y 2 ^ 0, and define /( 0, 0) = 0. Show that / has 
first partial derivatives at all points, satisfying the inequalities |/i(x, y)| ^ 6|x|, |/ 2 (x, y)| ^ 
6|y|. Is / differentiable at (0,0)? 

2. Let /(x, y) = (x 3 - y 3 )/(x 2 + y 2 ) if x 2 + y 2 ^ 0, and define /(0, 0) = 0. Show that / has 
first partial derivatives at all points, but that these derivatives are discontinuous at (0, 0). 
The function is not differentiable at (0,0). To prove this, show first that if it were 
differentiable, one could write 

** + y* = *-y + c(M + |y|), 

where e->0 as (x, y)-»(0, 0). Then show that this is impossible. Suggestion: Consider 
the situation when y = — x. 

3. Let f(x , y ) = (x 2 + y 2 ) sin - - — ; if x 2 + y 2 ^ 0, and define g(0, 0) = 0. Show that / 

Vx +y 

has first partial derivatives at all points, but that these derivatives are discontinuous at (0, 0). 
Show that |/](x, y)| ^ 2|x| + 1. Prove that / is differentiable at (0, 0). 

This example shows that the hypotheses in Theorems I and II are sufficient, but not 
necessary, conditions for differentiability. 

4. Prove that the function 


/(x ’ y)= (FT ’ff if < x <y)*(°’°) 
/( 0 , 0 ) = 0 


satisfies Laplace’s equation. 


a 2 f a 2 /_ 


dx dy‘ 


- 0, everywhere, but that / is not even continuous 


(let alone differentiable) at the origin. 


7.2 / CHANGING THE ORDER OF DIFFERENTIATION 

We mentioned at the outset of Chapter 6 that we ordinarily find the relation 


to be valid for the functions /(x, y) which we meet in everyday use of calculus. 
The relation (7.2-1) may be false in particular cases, however, and so it is well to 
know something of the conditions sufficient to guarantee its validity. 
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THEOREM III. Let the function f(x , y) be defined in some neighborhood of the 
point (a, b ). Let the partial derivatives f u fi, f 12 and f 2 1 also be defined in this 
neighborhood , and suppose that f n and f 2 1 are continuous at (a, b). Then 
/i 2 (a, b) = / 2 i(a, b). In other words , (7.2-1) bolds at the point (a, b). 


Proof. We shall work entirely inside a square having its center at (a, b), and 
lying inside the neighborhood referred to in the theorem. Let h be a number 
different from zero such that the point (a + h, b + h) is inside the square just 
referred to. Consider the expression 

D = f(a + h,b + h)-f(a + h, b)~f(a , b + h) + /(a, b). 

If we introduce the function 

4>(*) = f(x , b + h)~ /(*, b), 

we can express D in the form 

D = (Ma + h)-<Ma). (7.2-2) 

Now <f) has the derivative 

<A'W = / 1 (x,b + h)-/ 1 (x,b). 

Hence </> is continuous, and we may apply the law of the mean to (7.2-2), with 
the result 


D = h<f)'(a + 6 ]h) = h[f y (a + 6ih , b + h) - fi(a + 0ih, b)], 
where 0 < 61 < 1. Next, let 

g(y) = /i(a + 0iM)- 
The function g has the derivative 

g\y) = fn(a + 6,h, y). 

Now we can write (7.2-3) in the form 

D = h[g(b + h)-gm 
and apply the law of the mean. The result is 

D = h 2 g'(b + 0 2 h) = h 2 f u (a + h, b + 0 2 h), 


where 0 < 0 2 < 1. 

We might instead have started by expressing D in the form 

D = 1 j/(b + h) - 1 p(b), 

where 

*Ky) = f(a + h,y)-f(a,y). 

This procedure would have led to an expression 

D = h 2 f 2i (a + 0 4 b, b + 0 3 b), 


(7.2-3) 
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with 0 < < 1, 0 < 0 4 < 1. On comparing the two expressions for D we see that 

f\ 2 (a + 0\h, b + 0 2 h) = f 2 i(a + 0 4 h , b + 0 3 h). ( 7.2-4 ) 

If we now make h->0, the points at which the derivatives in (7.2-4) are 
evaluated both approach (a, b). Hence, by the assumed continuity of / 12 and f 2] , 
we conclude that f ]2 (a, b) = f 2 j(a, b). This completes the proof. 

The conditions of Theorem III are not the only known sufficient conditions 
for the truth of (7.2-1). The theorem provides a useful working criterion, 
however. Another criterion is furnished by the following theorem: 

THEOREM IV. Let f(x , y) and its first partial derivatives f t , f 2 be defined in a 
neighborhood of the point (a, b), and suppose that fi and f 2 are differentiable 
at that point. Then fi 2 (a, b) = f 2 t(a, b). 

This theorem requires more of the function / in some ways, and less in 
others, than Theorem III. We omit the proof, which begins very much like that 
of Theorem III. 

We may use Theorem III or Theorem IV to prove that a mixed partial 
derivative of order higher than the second is independent of the order of 
performing the differentiations, provided we make appropriate assumptions 
about the continuity or differentiability. Thus, for example, suppose that f(x, y ) 
and all its partial derivatives of orders one, two, and three are continuous. (This 
is more than we actually need.) Then 

d 3 f = d 3 f = d 3 f 
dy 2 dx dy dx dy dx dy 2 

For / 122 — (/ 12)2 ~ (/2O2 = /212, 

and /212 = (/2)l2 = (/2)21 = fl2\- 

By similar arguments we can deal with functions of more than two in- 
dependent variables. 

EXERCISES 

1. Let f(x, y) = xy ( * 2 ^ 2 ) if x 2 + y 2 ^ 0, and define /( 0, 0) = 0. Show that /i<0, y) = 

-y and / 2 (x, 0) = x for all x and y. Then show that /i 2 (0, 0) = -1 and / 2t (0, 0) = 1. 

2. Define /(x, y) = x 2 tan '(y /x) - y 2 tan“ l (x/y) if neither x nor y is zero, and /(x, y) = 
0 if either x = 0 or y = 0 (or both). Show as in Exercise 1 that / l2 (0, 0) = -1, / 2 i(0,0) = 1. 

3. Theorem III can be improved as follows: 

Assume that / is defined in some neighborhood of the point (a, b), and that the partial 
derivatives / 1 , / 2 , fn are also defined in this neighborhood. Suppose finally that f i2 is 
continuous at (a, b). Then it is true that f 2 1 is defined at (a, b) and that fn(a , b) = / 2 i(a, b). 
(This theorem is due to H. A. Schwarz.) 

Prove the theorem with the assistance of the following suggestions: Let h, k be small 
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numbers different from zero. Let 

A = f(a + h, b + k) - f(a + h,b)- f(a , b + k) + /(a, b). 

Show that there are numbers 0 U 0 2 between 0 and 1, depending on h and k , such that 

A = hkf i2 (a + Oih, b + $ 2 k). 

If €>0, choose 8>0 so that b) <€ if 0<|/i|<8 and 0<|fc|<8. Why is this 

possible? Find the limit of — with h fixed, as k->0, and so conclude that 

h(« + h,b)-h(a,b) _ fi2(ab) ^ £ 

if 0 < |h| < 8. Now complete the proof of Schwarz’s theorem. 

4. To prove Theorem IV, start as in the proof of Theorem III, and obtain (7.2-3). Then, 
from the fact that / 1 is differentiable at (a, b), one can write 

fi(a + $ih, b + h) = /i(a, b) + f u (a, b)dth +/i 2 (a, b)h + ci|h|, 

where e x -> 0 as h -> 0. Explain why this is so. 

Then go on to explain how to obtain the expression 

D = h 2 fi 2 (a, by+ e\h\h, 

where €->0 as h-> 0. Explain the derivation of the similar expression (where 6'-»0 as 
h->0) 

D - h 2 f 2 \(a , b) + €'\h\h, 

using the fact that f 2 is differentiable at (a, b). Now complete the proof of Theorem IV. 


7.3 / DIFFERENTIALS OF COMPOSITE FUNCTIONS 

In §6.4 we made the statement that a differentiable function of differentiable 
functions is differentiable. We now formulate this proposition in precise terms, 
and prove it. There may be any number of independent variables in each of the 
functions involved. For simplicity we deal with the case of two variables 
throughout. 


iTHEOREM V. Let F(x , y) be defined in some neighborhood of the point (a, b), 
and let it be differentiable there . Let f(s , t) and g(s , t) be defined in some 
neighborhood of (s 0 , lo), and differentiable at that point . Assume further that 

/(s 0 , t 0 ) = a, g(s 0 , to) = b , 

and consider the composite function 

G(s,t) = F(f(s , 0,g(M». 

Then G is differentiable at (s 0 , to). Its differential may be written 



(7.3-1) 
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where 

dx=fds+^dt, (7.3-2) 

ds dt 

dy = y- ds 4- jj: dt. 
ds dt 

It is to be understood that the partial derivatives of F are evaluated at ( a , b), 
those of f and g at (s 0 , to), and that ds, dt are independent variables. 


I Proof. Let us write u = G(s, t), x = f(s, t), y = g(s, t). For arbitrary As, At 
write Au = G(s 0 + As, t 0 + At)- G(s 0 , to), with corresponding meanings for Ax, 
Ay. Note that Au = F(a + Ax, b + Ay) -F(a,b). Accordingly, using one of the 
formulations of the differentiability condition discussed in §7, we may write 

Au = F] Ax + F 2 Ay + e(|Ax| +'|Ay|), 

Ax = /i As + / 2 At + 5(| As | + |At|), (7.3-3) 

Ay = g x As + g 2 At + r| (| As | + |At|), 

where e^O as Ax and Ay 0, while 8 0 and tj 0 as As and At 0. Here, for 

convenience and brevity, we are using F { to stand for Fi(a, b), f { for /i(x 0 , t 0 ), 
and so on. From (7.3-3) we see that 

Am = F\(ji As + U At) + F 2 (g, As + g 2 At) + (F, S + F 2 t))(|As| + |At|) + e(|Ax| + |Ay|). 
This may be rewritten in the form 


Am = (Fi/i •+ F 2 g,) As + (F,/ 2 + F 2 g 2 ) At 


+ 



8 + F 2 7] + € 


[Ax | + 
|As| + 



If we can show that 


lim 

(As, Af)-»(0, 0) 


|FiS + F 2 r] + e 


!AxJ_+jAyJ] 
|As| + | Ay | J 


= 0 , 


we shall have proved that G is differentiable, with the differential 


(7.3-4) 


dG = (FJ i + F 2 gi) ds + (Fi/ 2 + F 2 g 2 ) dt. 

This last form is equivalent to the combination of (7.3-1) and (7.3-2). Hence it 
remains only to prove (7.3-4). 

Now Fi and F 2 denote constants in (7.3-4), so that F,5 + F 2 tj^ 0. The 
functions / and g are continuous at (s 0 , to) by Theorem II, §6.4. Hence Ax and 
Ay^O, and therefore e-»0 as As and At->0. Let M be a constant larger than 
the greatest of the numbers |/i|, \f 2 \, |gi|, |g 2 |. Then by (7.3-3) we see that 


| Ax | ^ (M + S)(|As| + | AT |), |Ay| ^(M + tj)(|As| + |At|). 


Hence 


€ l^ x l + M < e (2M + 5 + n) 
6 \As\ + \At\ = €{ZM + d + V) - 
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This quantity approaches zero as As and At approach zero. The assertion (7.3-4) 
is there established. 

7.4 / THE LAW OF THE MEAN 

The student is already familiar with the law of the mean for functions of a single 
independent variable, in the form 

f(a + h) - f(a) = hf\a + Oh), 0 < fl < 1 (7.4-1) 

* (see (1.2-4) and Theorem IV, §1.2). It is useful to have a generalization of this 
result for functions of several independent variables. In seeking an appropriate 
form for such a generalization, we look upon (7.4-1) as furnishing a convenient 
formula for the difference between the values of the function f at two points of 
the x-axis, namely x - a and x = a + h. This leads us, in the case of a function of 
two variables, to search for a means of expressing the difference 

F(a + h, b + k) - F(a, b), 

where the line segment joining the points (a, b), (a + h, b + k) lies in the region of 
definition of the function F. 

\ THEOREM VI. Let F be defined in a region R of the xy -plane. Let L be the line 
segment with ends (a, b), (a + ft, b + k). We suppose that L lies in R and that 
all points of L except possibly the ends are interior points of R. Finally, we 
assume that F is continuous at each point of L and differentiable at each 
such point with the possible exception of the ends. Then, for a certain value 
of 6, such that 0 < 6 < 1, we have 

F(a + h, b + k) — F(a, b) = hF\(a 4- 6h, b 4- 0k)+ kF 2 (a 4- 6h, b 4- Ok). 

(7.4-2) 

I Proof* By introducing a parametric representation of the line segment L, 
x = a + th, y = b + tk, 0 ^ t ^ 1, 

we are able to regard the value of F along L as a function of the parameter t. Let 
us write 

f(t) = F(a + th, b + tk). (7.4-3) 

The derivative of this composite function is 

f'(t) = hFi(a + th, b + tk) + kF 2 (a + th, b + tk). (7.4-4) 

Applying the ordinary law of the mean in the form (7.4-1), we have 

/(l)-/(O) = /mO<0<l. (7.4-5) 

On setting t = 0, 1 successively in (7.4-3), and t = 0 in (7.4-4), we see that (7.4-5) 
becomes the formula (7.4-2). The proof is therefore complete. 
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Observe that the point (a + 6h, b + 6k) is a point of 
the segment L somewhere between its ends (see Fig. 

48). 

When a set has the property that for each pair of 
points belonging to it the straight line segment connec- 
ting them consists entirely of points which also belong to 
the set, then the set is said to be convex. 

If the domain R of the function F in Theorem VI is 
both open and convex, then clearly (7.4-2) holds for 
every pair of points (a, b) and (a + h, b + k) in R. It 
follows immediately that if both first partial derivatives of F vanish throughout R , 
then F has the same value at (a + h, b + k) as at (a, b). By holding a and b fixed 
while letting h and k vary, (a 4 - h, b + k) can be made to represent each point in R. 
Therefore we see that F must take on the same value at each point of its domain — 
in other words, F must be constant. This result is an analogue, for functions of two 
variables, of Theorem V, §1.2. 

Actually, we can get a more general analogue if we replace the condition of 
convexity by a weaker condition called connectedness. For our purpose here it 
will suffice to define connectedness for sets that are open. 

fDefinitian , An open set S in the plane (or in space of three dimensions ) is called 
connected in case each pair of points in S can be joined by a path consisting of a 
finite number of straight line segments joined end to end consecutively , the whole 
path lying entirely in S and not crossing itself anywhere. Such a path may be 
called a polygonal arc. 

Later, in §17.7, we shall give a general definition of connectedness, applic- 
able to any set, open or not. When that definition is applied to open sets, it turns 
out to be equivalent to the definition which we are using here. 

As an example of a nonconnected set, let S consist of all 
points of the plane for which x 2 >l , that is, all points except 
those for which - 1 ^ x ^ 1. Plainly S consists of two separated 
parts (see Fig. 49). Two points, one in each part, cannot be 
joined by a broken-line path lying entirely in S. This particu- 
lar set S comes naturally to our attention if we study the 
function 

f(x, y) = y + Vx 2 - 1. 

Now we come to the theorem. 


s 

; : 

i 
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Fig. 49. 



^THEOREM VlL Let F(x, y) be a function which is defined and differentiable 
throughout a connected open set S, and suppose that the first partial 
derivatives of F vanish at each point of S. Then F(x, y) is constant in 
S. 
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Proof *,Suppose A and B are any two points of S. Let 
them be joined by a path consisting of segments 
APi, PiP 2 , . . Pn-iPn, PnB, all lying in S (see Fig. 50). By 
the comment just after the proof of Theorem VI we see 
that F has the same value at A as at Pi, the same value at 
Pi as at P 2 , and so on, so that F has the same value at B as 
at A. This means that F is constant in S. 

Theorems VI and VII admit of immediate extension to functions of more 
than two independent variables. The extension of the law of the mean for three 
independent variables is 

F(a + h, b + k, c + /) - F(a, b , c) = hFi(x, y, z) + kF 2 (x, y, z) + /F 3 (x, y, z), 

(7.4-6) 



where x = a + Oh, y — b + Ok, z = c + 61. 

Formula (7.4-2) can also be written in the form 

F(x, y) - F(a, b) = F,(X, Y)(x ~a) + F 2 (X, Y)(y - b), (7.4-7) 

where (X, Y) is a certain point on the line joining (a, b) and (x, y). 

In §2.7 we had occasion to refer to the fact that a set may be empty. For 
logical reasons we need to be aware that, even though a set is empty, it may 
have certain properties “by default.” For example, we cite the fact that the 
empty set is open. The logic of the situation is that a set S is called open if for 
every point P in S there is a neighborhood of P contained in S. If S is empty 
there is no point P in S and hence the requirement about a neighborhood of P 
has no force as a restriction on S. We say that the requirement “is satisfied 
vacuously,” which means that it is satisfied by default. Hence S is open. 
Similarly, an empty set is connected. 

EXERCISES 

1. Let F(x, y) = xy 2 -x 2 y. Find the appropriate value of 6 in (7.4-2) if (a) a = b = 
0, h = 1, k = 2; (b) a = b = 0, h = 3, k = 2; (c) a = b = 1, h = 3, k = 2. 

2. Let F(x, y) be the quadratic function Ax 2 + 2Bxy + Cy 2 . Show that (7.4-7) holds 
with X = s(x + a), Y = i(y + b). What does this mean about the value of S in (7.4-2)? 

3. Taking F(x, y) = sin x cos y, prove that for some 6 between 0 and 1 it is true that 

3 TT 7tS 7T0 7T.7t6.1t8 

- = -T COS ~~r~ COS —z TSin^r sin -7— 

4 3 3 6 6 3 6 

4. (a) Write out formula (7.4-2) for F(x, y) = log(xe y2 ), with a = 1, b = 0, h = e - 1, 

k = 1. (b) Write out (7.4-7) for this same function, with a, b, x, y arbitrary, except that 

a >0, x >0. 

5. If x^a in (7.4-7), show that Y = b+ ^ — ~(X-a). Hence show that, under 

7 x - a 

suitable conditions, F(l, 0) - F(0, 1) = Fj(X, 1 - X) - F 2 (X, 1 - X). 

6. Let F(x, y) = (1 - 2xy + x 2 ) 1/2 . As a result of considering F(l, 0) — F(0, 1), show 
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that there is a number 0 such that 0 < 6 < 1 and 

1 - V2 = V2(l - 30)(1 - 26 + 30 2 r 3/2 . 

7. Let F(x, y, 2 ) = xyz. Find the appropriate value of 6 in (7.4-6) if (a) a = b = c = 

0, h = fc = /= 1; (b) a = b -0, c = 1, h = k = 1, l = -l; (c) a = c = 0, b = 1, ft = l = 1, 

Ac = 0. 

8. Show that the open disc {(x, y): jc 2 + y 2 < 1} is convex. 

Hint: Begin by showing that the line segment determined by any two points (a, b) and 
(a + h, b + k) in the disc is the set of all points of the form {a + th,b + tk} where 0 < t < 1. 

(Actually, we still have a convex set if we join to this open disc some or all of its 
boundary.) 

9. Explain why the empty set is connected, and why every set consisting of just one 
point is connected. 


7.5 / TAYLOR S FORMULA AND SERIES 

Just as we extended the ordinary law of the mean to functions of several 
variables, so we may extend the version of Taylor’s formula given in §4.3. The 
method is the same as that employed in the proof of Theorem VI, §7.4. We write 

f(t) = F(a + th, b + tk) (7.5-1) 

and apply Taylor’s formula to f(t), using the two values t = 0, t = 1. From (4.3-7) 
with a — 0, h = 1 we have 

/O) = /(0) + m+---+ I ^ + f^ T ^’ 0<6 < 1. (7.5-2) 

The assumptions are that F and its partial derivatives of orders 1 to n inclusive 
are differentiable at all points along the line joining (a, b ) and (a + h, b + k). The 
main problem now is that of calculating the higher derivatives of / from (7.5-1). 
The first derivative is given by (7.4-4) in the previous section. Working from that 
formula, we see that 

HO = h[hF u + kF n ] + k[hF 2l + kF 22 l 

where all the partial derivatives on the right are evaluated at (a + th,b + tk). 
Since F n = F 2t (Theorem IV, §7.2), we have 

f"(t) = h 2 F u + 2hkF ]2 +k 2 F 22 . 

This is sometimes written in the form 

^" (0 = i( h ^ +k ^) 2F(x ’ y) l 

L' UX O y/ J x = a+th, y = b+fk 


it being understood that 

( h ^ +k j^) 2F{x ’ y)=h 


d 2 F 

dx 2 


+ 2hk 


d 2 F 


+^2 

dx dy dy 


d 2 F 
2 


The analogy with the pattern of the binomial expansion is now evident. We have 
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= h 3 F„, + 3h 2 kF U2 + 3hk 2 F i22 +k 3 F 222 

= ( h -k +k i) F( *’ y) ’ 

with x and y set equal to a + th and b + fk, respectively, in the partial deriva- 
tives. The general formula is 


/<n>(o= K h £ +k ^)" F( *’ y) l • (? - 5 - 3) 

LV dx dy / lx = a+th,y = b+tk 

To get Taylor’s formula for F(x, y) we use (7.5-1) and (7.5-3) to substitute in 
(7.5-2). For n = 1 the result is 

F(a + h, b + k) = F(a, b ) + hF,(a, b) + kF 2 (a, b) + R 2 , (7.5-4) 

where 




2! 


( h i +k i) 2F ^ y) L^, 


y = b+0k 


and 0 < 6 < 1. For n = 2 the result is 


F(a + h, b + k) = F(a, b) + hF,(a, b) + kF 2 (a, b) 

+ Jy [ft 2 F„(a, ft) + 2hkF l2 (a, b ) + k 2 F 22 («, ft)] + i? 3 , (7.5-5) 


where 




3! 


( h ^ +k i) 3,7( *’ y) ] 


jc = a+0h, y = ft+flk 


and 0 < 6 < 1 . The extension to higher values of n is obvious. Observe that from 
/ (n) (0) we get a homogeneous polynomial of degree n in h and k, the coefficient 
nl d n F 

of h n p k p being — ^ dx n ~ p dy p ’ part * a * derivative evaluated at 

x = a, y = b. 

Under certain conditions on the function F, the point (a, b), and the size of h 
and k, it may happen that F(a + h, b + k) can be represented as the infinite series 

F(a + h,b + k) = F(a,b)+J, :M (»«£+* 7 ^)" F(*,y)| - (7.5-6) 

This is the form of Taylor’s series for a function of two variables. If we use only 
a specified number of terms of this series we get an approximate expression for 
F(a + h, b + k). Frequently it is more convenient to write (x, y) in place of 
(a + h, b + k). The typical term of the series then becomes a homogeneous 
polynomial in (x - a) and (y - b). 

Example . Write Taylor’s formula (7.5-4) with n = 1, and carry the series 
(7.5-6) through the term in n = 2, if F(x, y) = l/(xy), a — 1, b = — 1. 
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From F(x, y) = x l y 1 we readily find 


OF _2 -| dF _j _2 
-=-x y , — =-x y , 
dx J dy 


v - -3 _i _2 _2 

^r = 2 ^ y = x y V 


d^F 

dx ‘ 


dx dy 


d 2 F 

dy 2 


= 2x~ x y~\ 


It is now easily seen that (7.5-4) becomes 
1 


(1 + h)(— 1 + k) 
R 2 = 


— 1 + h — k + R 2 , 

h 2 


+ 


hk 


(1 + 0h )\- 1 + dk) (1 + 0h) 2 (-l + Ok) 2 (1 + Oh)(-l + 0k) 3 ' 
The series (7.5-6) begins 
1 


(1 + hX-l + k) 


= -l + (h-k) + (-h 2 +hk-k 2 )+ - • •- 


Detailed verification should be supplied by the student as he reads this example. 
If we write x = l + h, y = — 1 + k, the last formula becomes 

-L=-l + [(x~l) — (y + l)] 

■X y 

+ [-(X - l) 2 + (x - l)(y + 1) - (y + l) 2 ] + • • 


We shall not investigate the precise limitations on (x — 1) and (y + 1) which are 
necessary in this expansion. 

There are situations in which we need the Taylor series with remainder for 
functions of more than two variables. Suppose for example that we wish to 
expand a function F(xi, x 2 , . . x„) about the point a b a 2 , . . a n . One way of 
writing the expansion to terms of degree k, followed by a remainder, is 


F(ai+ h u a 2 +h 2 , . . ., a n + K) = F(a u a 2 , . . a„) + 


+ ±±L,JL +lh JL + ... + k >)' P 

r! \ dX\ dX 2 ^Xn/ 

1 /, d , , d , , . d \* + ' r . 


+ 
l( + l 


where all the derivatives up through those of order k are evaluated at the point 
(a u a 2 , . . a n ), and those of order k + 1 are evaluated at a, + 0h y a 2 + 
0h 2 , . . a n + 0h n where 0 is some number properly between 0 and 1. The line 
through (a u a 2 , . . ., a n ) and (aj + h u a 2 + h 2 , . . ., a n + h n ) has parametric equations 
Xi = ai+ th u x 2 = a 2 + th 2 , . . x n = a„ + th n . Therefore the point at which the 
(k + l)st order partial derivatives in the remainder term are evaluated is between the 
two points where t = 0 and t = 1. 

Another way of saying the same thing results from letting Xi = a x + h u 
x 2 = a 2 + h 2 , . . x„ = a n + h n . Replacing the h’s by their equivalents in terms of 
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the x’s gives 


k 1 

F(x t , x 2 , . . x n ) = F(a,, a 2 , . . a„) + 2 

r=i r! 





1 

(k + 1 )! 






k+1 


where all derivatives or order less than or equal to k are evaluated at 
(a i, a 2 , . . a„) and the derivatives in the remainder term, all of which are of 
order (k + 1), are evaluated at some point P* on the line segment connecting 
( a u ■ • ■, a n ) and (x u . . x n ). 

In the theory of maxima and minima of functions of several variables, we 
are interested in the case where k = 1. The above formula then reduces to 


F(x,, x 2 , . . x n ) = F(a u a 2 , ■ . a n ) + (x, - a^F^au 


. a„) + ■ • • + 


(7.5-7) 


(^n u n )F n (di, d 2 , . . ., u n ) T 

n 

= F(a t , a 2 , . . a„) + 2 (*• ~ a : )F,(a ,, a 2 , . . a„) + 

i= 1 

+ y. 2 (* ~ Oi)(x, - a^F^P*) 

i,i=l 

where is evaluated at some point of the form [a\ + 0(*i - aO, a 2 + 
0(x 2 - a 2 ), . . a n + 6(x n - a n )] — in other words, at some point on the line segment 
connecting (a b a 2 , . . a n ) and (jq, x 2 , . . x n ). The remainder term in this special 
case can be recognized as a quadratic form in the n variables Xi-ai,Jt 2 - 
a 2 , . . x n - a n . See §6.9, and recall that by Theorem III F;j = F ]{ under very 
general hypotheses on F. 


EXERCISES 

1. Write Taylor’s formula (7.5-5) for F(x, y) = sinx sin y, using a = 0, b = 0, and 

n = 2. 

2. Write Taylor’s formula for F(x, y) = cos x cos y, using a = 0, b = 0, and n = 2. 

3. Write Taylor’s formula with a=3, b = 3, n = 3 for F(x, y) = x 3 + y 3 - 9xy + 27. 

4. Let F(x, y) = log(x + e y ). Expand according to Taylor’s series in powers of x - 1 
and y, going far enough to include all terms of degree 2 in these quantities. 

5. Follow the instructions of Exercise 4 for F(x, y) = sin(e y +x 2 -2). 

6. Write Taylor’s series for e x cos y in powers of x and y, going far enough to 
include all terms of third degree. 

7. Write Taylor’s series for e~ y2t2xv in powers of x and y, going far enough to include 
all terms of fourth degree. 

8. Write Taylor’s series for x 2 y + xy 2 + 1 in powers of x - 1 and y - L 

9. Write Taylor’s series for xy 3 — y 2 + y + 2 in powers of x — 1 and y — 2. 
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10. (a) Find a linear function of x and y which is a good approximation for 


F(x, y) = tan 


/'XZJL') 

\l + xv 


when x and y are small. 


(b) Write the constant and linear terms in Taylor’s series of F(x, y) in powers of x - 3 and 


11. Write out in full the expression 


b. ( h i +k ^ F(x ’ y) - 


What does the expression become (a) if F(x, y) = x 4 -x 2 y 2 + y 4 ; (b) if F(x, y) = sinxy 
and if one sets x = y = (7 t/ 2) 1/2 after doing the differentiation? 

12. (a) Carry on the work of the illustrative example in the text, showing that 

fa"' p 3y p = (_1) " (rt -P) ! P ! x -P+y+ 1’ 
and that the polynomial of degree n in h and k in the Taylor’s series is 


(— 1)"~ 1 [fi” — h n ~'k + h n ~ 2 k : 
(b) Assuming that |h| < 1 and |lc[ < 1, write 


+ (-l)"k"]. 


(1 + #i)( — 1 + k) 


= -(l + h)-'(l-kr' 

= —(1 - h -h h 2 - h 3 -h ■ •)(! + fc + k 2 + • • ■). 


Then multiply the two series together term by term, and collect together the terms of like 
degree. Compare with the result found in (a). 


7.6 / SUFFICIENT CONDITIONS FOR A RELATIVE EXTREME 

In §6.3 we discussed relative maxima and minima for a function /(x, y). In 
Theorem I of that section we reached the important conclusion that if / attains a 
relative extreme value at an interior point of its region of definition, then 

necessarily the partial derivatives and -r*- vanish at the point (provided these 

ox dy 

derivatives exist, of course). The conditions 


|i-»,|t-0 <7.«) 

at a point do not in themselves guarantee a relative extreme, however. In this 
section we wish to develop criteria which, taken together with conditions (7.6-1), 
will guarantee a relative extreme, and enable us to distinguish a relative 
maximum from a relative minimum. 

It will be helpful if we begin by reviewing briefly the analogous con- 
siderations for a function of one variable. Suppose we have a function y =f(x) 
defined on some interval having x = a as an interior point. We suppose / to be 
differentiable on the interval, and we assume that the second derivative exists at 


x = a. 
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\ THEOREM VIII. Under the conditions on f as just stated , suppose that f'(a) = 0. 
Then 

(i) If f"(a)> 0, f has a relative minimum at x ■= a; 

(ii) If f"(a) <0, / has a relative maximum at x = a; 

(iii) If f”(a) = 0 no conclusion may be drawn; f may have a relative maxi- 
mum or minimum , or it may have neither. 

' Proof. Consider the value of / at any point x = a + h near x = a. We take 
h^ 0, but it may be either positive or negative. By the law of the mean, 

f(a + h) - f(a) = hf'(a + Oh ), 0 < 0 < 1. (7.6-2) 

Now, by the definition of f"(a ), we have 

Ax-»0 A* 

Since /'(a) = 0 we can write this in the form 

/,(a A + / X) = /" ( a) + €, 

where e is a variable quantity such that e-^0 as Ax^O. Consequently, if we 
choose Ax = Oh , we have 

f'(a + 0h) = (f"(a) + €)Bh 9 

and (7.6-2) becomes 

f(a + h)- f(a) = (f"(a) + e)0h 2 . (7.6-3) 

If we now suppose that f”(a) 0, we see that, as soon as h is small enough to 

insure that |e| < |/"(a)i, the sign of the left member of (7.6-3) is the same as the 
sign of f"(a). Consequently, if f"(a) >0, we conclude that f(a + h) >f(a) for all 
sufficiently small values of h different from zero. This means that f has a relative 
minimum at x = a. We have thus proved part (i) of the theorem; the same kind of 
argument is used for part (ii). If f"(a) = 0 no conclusion is reached, however, 
since we do not know the sign of e in (7.6-3). The three examples y = x 4 , y = 
-x 4 , y = x 3 , with a = 0, show that any of the possibilities mentioned in part (iii) 
may arise. 

Let us now turn to the case of two independent variables. We have the 
following corresponding theorem; 

’ THEOREM IX. Suppose that F(x, y) is defined and differentiable throughout a 
region R of which (a, b) is an interior point , and suppose that the first partial 
derivatives of F vanish at that point. Suppose further that the partial 
derivatives F { and F 2 are differentiable at (a, b). Let us write 

A = Fji(a, b) 9 B = F 12 (a, b), C = F 22 (a, b ). 
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Then 

(i) If B 2 - AC <0 and A >0, F has a relative minimum at (a, b); 

(ii) If B 2 - AC <0 and A < 0, F has a relative maximum at (a, b); 

(iii) If B 2 - AC >0, F has neither a maximum nor a minimum at the point; 

(iv) If B 2 - AC = 0, no conclusion may be drawn and any of the behaviors of 
F described in parts (i)-(iii) may occur. 

froof. Note that part (iii) of Theorem IX has no counterpart in Theorem 
VIII. The method of proof of Theorem IX is similar to that of Theorem VIII. We 
consider the point (a + h, b + k ), where h and k are both small, but not both 
zero. By the law of the mean we have (with 0 < 0 < 1) 

F(a + h, b + k)- F(a , b) = hF t (a + Oh, b + 0k) + kF 2 (a + 6h, b + 0k). 

Since we assumed F\ and F 2 differentiable, we may write (see the discussion of 
differentiability near the end of §7) 

F x (a + OK b + 0k) - F,(a, b) = 0hF u (a, b) + 0kF 12 (a, b) + €,(|0h| + |0fc|), 

where €i->0 as h and fc-»0. A similar expression may be written for F 2 , with 
some quantity e 2 in place of e\. Since Fi(a, b) = F 2 (a, b) = 0, by hypothesis, we 
have 

F(a + h,b + k)- F(a, b) = 0[Ah 2 + 2Bhk + Ck 2 ) + (|h| + \k\)(e { h + e 2 k)l (7.6-4) 

This formula is the counterpart of (7.6-3). To get the information we desire from 
it, however, it is more convenient to express h and k in terms of polar 
co-ordinates with origin at (a, b). Let us write 

h = r cos <£, k = r sin <f>. 

When these expressions are substituted in (7.6-4), a factor r 2 can be taken out, 
and we get 

F(a + h, b + k) - F(a, b) = 0r 2 [G(4>) + 6], (7.6-5) 

where for abbreviation we have set 

G(<t>) = A cos 2 <f) + 2B sin <f> cos + C sin 2 <£, 

8 = (|cos (f > | + |sin <j > |)(ci cos <f> + e 2 sin </>). 

Here 0 < 0 < 1, and 8 -*0 as r-»0. Since G(<£) is independent of r, it is clear that, 
if G(<f>) 7 *~ 0, the sign of the left member of (7.6-5) will be the same as the sign of 
G((f>) when r is sufficiently small. Moreover, G(<£) is continuous, whence it 
follows that, if G(</>) is never zero, it always has the same sign, and, when r is 
sufficiently small, the sign of G(<£) + 5 will be the same as the sign of G(cf>). 
Therefore, if G(<f>) is always positive, F has a minimum at (a, b), while if G(<#>) 
is always negative, F has a maximum at (a, b). On the other hand, if G(<£) is 
sometimes positive and sometimes negative, F has neither a maximum nor a 
minimum at (a, b). We shall show that the cases (i)-(iii) in the statement of 
Theorem IX lead to exactly these three types of behavior for G(<£). 
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Observe that 


Ah + 2Bhk + Ck = r G(<£). 

This shows that the sign of G(<f>) is the same as the sign 
of the quadratic function 

f(h, k) = Ah 2 + 2Bhk + Ck 2 . 



Fig . 51. 


Now let us regard h, k as rectangular co-ordinates in 
a system with origin at the point x = a, y = b. Let h\ 
k' denote rectangular co-ordinates in a system 
obtained from the hk-system by a rotation about the 
origin of the system (see Fig. 51). As we saw in §6.9, 
it is possible to choose the rotation in such a way that /(h, k) becomes 

A 1 h' 2 +A 2 k' 2 =r 2 G(</>), 
where Ai and A 2 are roots of the equation 


(7.6-6) 


A-A B 
B C - A 


= A 2 - (A + C)A + AC - B 2 = 0. 


(7.6-7) 


Here h 2 + k 2 = h' 2 + k' 2 = r 2 . Observe that the product of the roots of (7.6-7) is 


and that the sum is 


AjA 2 — AC — B 2 ,- 
Ai + A 2 = A + C. 


(7.6-8) 

(7.6-9) 


Everything now depends on the sign of the expression (7.6-6). It is clear that if 
Ai and A 2 are both positive, G(<£) is positive, and we have the case of a minimum 
at (a, b ), whereas if Ai and A 2 are both negative, we have the case of a maximum 
at (a, b). Let us now consider cases (i) and (ii) of the theorem. The hypothesis 
B 2 - AC <0 implies that A and C are of the same sign, and also, by (7.6-8), that 
A! and A 2 are of the same sign. Consequently, from (7.6-9) we see that 
B 2 - AC <0 and A >0 imply that Ai and A 2 are positive, whereas B 2 — AC <0 
and A <0 imply that Ai and A 2 are negative. The conclusions in cases (i) and (ii) 
are therefore established by the foregoing arguments. 

In case (iii), B 2 -AC>0 implies that A, and A 2 are of opposite signs. Now 
we can choose <£ so as to make h ' = 0, k' A 0, r 2 G(<f>) = A 2 k' 2 , and we can also 
choose <f> so as to make h'^0, k' = 0, r 2 G(<f>) = \\h' 2 . Thus G(<j>) can change 
sign, and so we have neither a maximum nor a minimum at (a, b). 

Finally, if B 2 - AC = 0, at least one of the roots Ai and A 2 is zero, by (7.6-8). 
No conclusion about a maximum or minimum can be drawn in this case. The 
reasons for this are clear from (7.6-6) and (7.6-5). As examples we cite the three 
functions 
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Each of these functions comes under case (iv) of Theorem IX, with a — b = 0. 
The first has a minimum at (0, 0), the second a maximum, while the third has 
neither. 


A point at which all the first partial derivatives of a function are equal to 
zero is called a critical point of the function. If (a, b) is a critical point of 
F(x, y), and if B 2 - AC ^ 0 at (a, b) (in the notation of Theorem IX), we call the 
critical point nondegenerate. The point is called a saddle point if B 2 -AC> 0. 

It should be noted that, if we assume that F has partial derivatives of all 
orders, and that it can be represented by its Taylor’s series (see §7.5), then, when 
(a, b) is a critical point, the Taylor’s series starts as follows: 

F(a + h,b + k) = F(a,b) + ~[Ah 2 + 2Bhk + Ck 2 ]+- ■ ■. 

Observe also that AC - B 2 is the discriminant of the quadratic form 

Ah 2 + 2Bhk + Ck 2 . 

This gives us a clue to the proper method of defining nondegenerate critical 
points for functions of more than two variables. We illustrate briefly for the case 
of three variables. Suppose the origin is a critical point of the function 
F(x i,x 2 , x 3 ), so that Fi(0, 0, 0) = F 2 (0, 0, 0) = F 3 (0, 0, 0) = 0. Let 0 n = F n ( 0,0,0), 
a 12 = F 12 ( 0, 0, 0), a B = F ]3 (0, 0, 0), and so on. 

Then Taylor’s series is 

1 3 

F(x h x 2 , x } ) = F(0, 0, 0) + y 2 a ‘> x > x i + • ' 

U=l 

The critical point is called nondegenerate in case the determinant 


011 

012 

013 

021 

022 

023 

031 

032 

033 


is not zero. For nondegenerate critical points there is a generalization of 
Theorem IX. We consider the roots A b A 2 , A 3 of the equation 


an — A ai 2 

02 i a 22 — A 

a 3 i a 32 


a i 3 

<*23 


= 0 . 


a 33 — A 


If these roots are all of the same sign, F has a minimum at the critical point if 
the roots are all positive , and a maximum there if they are all negative. But, if 
some roots are positive and some negative, there is neither a maximum nor a 
minimum at the critical point. These facts follow quickly from the discussion of 
quadratic forms in §6.9. It is not actually necessary to assume the convergence 
of Taylor’s series. All that is needed is that the first derivatives of F be 
differentiable at the critical point. 
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We shall now indicate how the foregoing critical point theory can be 
developed for functions of more than two variables. We suppose that we have a 
function F(x b . . x„) of n independent variables, defined and differentiable at 
all points in a region including (a b . . a n ) as an interior point. We shall assume 
that (a i, . is a critical point of F that is, that all the first partial derivatives 
of F vanish at this point. And finally we assume that each of the first partial 
derivatives of F is differentiable at least at this one point (a b . . ., a„). By an easy 
extension of Theorem IV to n variables, these hypotheses assure us that the 
second partial derivatives exist at the critical point, and that the symmetrical 
d 2 F d 2 F 

relations of the type - — — = - — — are all valid. As usual, we use the notation 

yXj uXj C/Xj uX[ 

d 2 F 

= F‘i- From now on we restrict our attention to the points contained within 


some ball centered at (a b . . a n ) where the radius of the ball is small enough so 
that all points within the ball belong also to the region in which our assumptions 
about F hold. 

To compare the value of F at(x b . . ., x„) with the value at (ai, . . ., a n ) we use the 
law of the mean. This was proved for functions of two variables in Theorem VI, 
and later in §7.4 it was pointed out that the natural extension to functions of n 
variables is also valid. For convenience let h, = x, — a,. Then there is some 0, 


(0 < 6 < 1), such that 


F(x b . . ., x n ) - F(a b ...,<0 = 2 hFXai + 0h b + Oh n ) (7.6-10) 


Next, we use the differentiability of all the F,’ s at (a b . . ., a n ). It enables us tc 
write 


F(a i + Ax,, . . ., a n + Ax„) - Fi(a b . . ., a„) 

= 2 Fy(a b . . ., a n ) Axj + e, V(Ax,) 2 + • ■ - + (Ax n ) 2 (7.6—11 y 


where €j depends on Ax b . . ., Ax„ in such a way that €, -^0 when all the Axj’s->0. 
See (7.2). If we now put Ax, = Ohi in (7.6-11) and combine with (7.6-10), bearing 
in mind that the first order partial derivatives all vanish at the critical point, we 
see that 


F(x b . . ., x„) - F(a b ...,<0 = 0 [ 2 j F u (au ■ • a„) h 4 fij J 


+ 0(h?+--- + h 2 ) 1/2 2^ i . (7.6-12) 


i= 1 


The value of 6 is unknown but we do know that it is properly between 0 and 1 
and hence positive. Consequently, the sign of the difference F(x b . . ., x„) - 
F(a b . . ., a n ) is the same as the sign of 

2 •••>«") hihj + (h? + ■ • • + h 2 n) m 2 e,hi. 

i.j-t i=1 


(7.6-13) 
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The first summation is a quadratic form in the variables hi , . . ., h n . The absolute 
value of the second part is small in comparison with (h?+ • * • + hi) when the 
latter is small, because the €,-* s approach zero as the h,’ s-»0. In fact, by Cauchy’s 
inequality, (see §6.8, Exercise 8), 


« I / n \ 112 / n \ l > 2 
(|«) ■ 

n | / n \ 1/2 / n \ 

(&! + •■• + hi) 112 2 | ^ (S € i) (s h] J, 


justifying the assertion made in the preceding sentence. 

This brings us to the essential aspect of our theory, and we focus attention 
on the quadratic form 

n 

Q(hi, . . ., h n ) 2 Ejj ( u i , . . ., Q n )hjhj. 

»J = 1 

If this quadratic form remains always of the same sign, no matter how the hj’s 
are chosen, so long as they are not all zero, then as we shall see in a moment, the 
sign of the quadratic form and the sign of F(x h . . x„)-F(ai, . . ., a n ) will be 
the same when (xi, . . ., x„) is sufficiently close to (fli, . . ., a„), that is, when 
(hi+ • • • + hl) m is sufficiently small. Then we have a relative minimum at 
(ai, . . ., a n ) if the difference is positive and a relative maximum at (a t , . . ., a„) if the 
difference is negative. 

The proof of what has been said depends on a straightforward generalization 
from F(x, y, z) to F(xi, ...,x„) in Exercise 14 of §6.9. This shows that by a 
suitable rotation of axes we can transform the quadratic form to one having only 
squares, 

£ FiMu ■ ■ aMhj = £ A;fcj , (7.6-15) 

i, j = 1 i = 1 

where the coefficients are roots of the equation 


f II - 

A F 12 

■ • F u 

Fn 

Fi 2 - A . 

.. f 2 „ 

F„, 

F n l 

. . F„„ - A 


(7.6-16) 


and Fjj, of course, is an abbreviation for F^ai . . . a„). It is not hard to see that 
this determinant is a polynomial of degree n in the variable A, and therefore we 
can think of the equation in the form 


c n A n + c n -iA n 1 + • • • + CiA + Co — 0, 


(7.6-16)' 


in which it is not hard to see that c n = (-l) n , while the constant term, c 0 , is given 
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by 

F n F n ... Fi n 
F 2 \ F 2 2 • - • F 2n 

Fni F nl ... F nn 

If this determinant is not zero, then we say that the critical point is non- 
degenerate. The significant thing about nondegenerate critical points is im- 
mediately apparent from (7.6-16)', that none of the A fs can he zero. It is also 
clear that if the critical point is degenerate , that is, if the above determinant is 
zero at the critical point, then at least one of the A,-’s is zero. 

We should perhaps remind ourselves at this point that it follows from the 
generalization of Exercise 14 of §6.9 that, whether the critical point is degenerate 
or nondegenerate, all the A;’s have to be real. This is seen from the fact that the 
A f ’s arise there as a sequence of extreme values of a real function under a 
sequence of increasingly limiting constraints. To those who have studied matrix 
theory it is also obvious from the fact that the A/s are characteristic values of a 
real symmetric matrix. 

It is obvious from (7.6-15) that the quadratic form cannot change sign if all 
the Aj’s are positive or if they are all negative. However, if some are positive and 
others are negative then the quadratic form does change sign. Quadratic forms 
which are positive (negative) except when all the variables are zero are said to 
be positive ( negative ) definite. If a form is never negative (positive) but may 
vanish when not all the variables are zero, it is called positive ( negative ) 
semidefinite . Quadratic forms which take both positive and negative values are 
said to be indefinite. 

The ordered n-tuple (ki, . . ., k n ) in (7.6-15) represents the same vector 
after the rotation of axes as was represented by (hi, . . ., h n ) in the original 
coordinate system. The individual components of a vector change under a 

n n 

rotation of axes but length does not. Therefore 2 M = 2 k?- Using this fact we 

i = 1 i = 1 

can infer from (7.6-14) and (7.6-13) that the sign of F(*i, . . ., x n ) - F(ai, . . ., a n ) 

n / n \ 1/2 n 

is the same as the sign of 2 A R, where |K| < 2 € ?) 2 Consider first 

the case where all the Aj’s are, positive and let the smallest one be m. Then from 
the above we see that 

2 A ikj+R > j^m - (2 e i) J 2 

and the Cj’ s will be so small that this is positive if we stay close enough to 
(a j, . . ., a n ). Similarly, turning now to the case where all the A’s are negative and 
the largest is — m, we can write 

[ / n \ 1/2-1 n 

-m + (2ei) j 2 H 
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The right-hand side will be negative if we constrain the e/’s to be small enough 
by staying sufficiently close to (a\ 9 . . a n ). 

So in summary we can say that if the A’s are all positive, F(x . . .,x„)> 
F(a u . . ., a n ) for all (xi, . . x n ) sufficiently close to (a h . . a n ) and thus the 
critical point is one where F has a local minimum. Similarly, if all the A,’s are 
negative, then F(x u . . ., x„) <F(a u . . ., a n ) throughout some neighborhood of the 
critical point and hence F has a local maximum. Actually we have proved a little 
bit more than this. By definition, F has a local minimum at (a 1? . . ., a n ) in case 
there is some neighborhood of this point at every other point of which the value 
of the function is merely greater than or equal to the value at (ai, . . ., a„). We 
have shown that at a critical point where all the A/’s are positive, there is a 
neighborhood at all other points of which F is strictly greater than F(a b . . ., a n ). 
Such a point is not just a local minimum but a strict local minimum. In a similar 
way we distinguish between a local maximum and a strict local maximum. Now, 
what we have actually proved is that at a critical point where all the A/’s are 
positive the function has a strict local minimum and at a critical point where all 
the A/’s are negative, the function has a strict local maximum. 

These criteria which we have just obtained are not very practical because 
they give no way of telling whether the A/’s are all positive or all negative and it 
would usually not be feasible to compute all these numbers by solving (7.6-16)'. 
Fortunately there is a method, sometimes known as Gundelfinger’s rule, which 
can, without finding all the A/’s, tell us whether they are all of the same sign. To 
use it we begin by computing d u d 2 , . . d n defined as follows: 


d x = F n ; d 2 = 


F n 
F 2l 


F n 
F 22 


Gundelfinger’s rule tells us that: 



Fin 

F 2n 


(i) A necessary and sufficient condition that all the A/’s be positive is that all the dC s be 
positive. 

(ii) A necessary and sufficient condition that all the A,’s be negative is that the d Ts 
alternate in sign with d 1 being negative. 


Both (i) and (ii) apply only to critical points where all the A/’s are different from 
zero, that is, to nondegenerate critical points, but they do enable us to classify 
nondegenerate critical points by looking at the easily obtainable d/’s instead of 
the hard-to-find A ( ’s. Our previous arguments allow us to assert immediately 
that: 


(i) A necessary and sufficient condition that a nondegenerate critical point give a 
strict local minimum is that all the dC s be greater than zero. 

(ii) A necessary and sufficient condition that a nondegenerate critical point give a 
strict local maximum is that the dC s alternately less than zero and greater than 
zero with d i being less than zero. 
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(iii) A nondegenerate critical point which is neither a strict local minimum nor a strict 
local maximum is a saddle point, where the function has a strict local maximum 
from some directions and a strict local minimum from others. 

There are extensions of Gundelfinger’s rule which enable us to classify 
degenerate critical points, where the maxima and minima are nonstrict, but we 
shall not include these details. 

EXERCISES 

1. Find all the critical points of each of the following functions. Test each critical 
point by Theorem IX, and state your conclusion. 

(a) y 2 + 3x 4 -4x 3 - 12x 2 + 24. 

(b) x 2 — 12y 2 + 4y 3 + 3y 4 . 

(c) x 4 +y 4 -2x 2 + 4xy-2y 2 . 

(d) x 2 y 2 -5x 2 -8xy-5y 2 . 

(e) xy(12- 3x -4y). 

(f) x 3 y 2 (a - x - y), a > 0. 

(g) (l-x)(l-y)(x + y-l). 

(h) x 2 y(24-x-y) 3 . 

, J 4 8_ 

xy x y xy 

(j) I(x 2 + y 2 )-18x-24y + 5VPT^ + 250. 

(k) 5(x 2 + y 2 ) - 24 x - 32y - 60Vx 2 + y 2 + 1000. 

(l) 12x sin y - 2x 2 sin y + x 2 sin y cos y. 

2. If a and b are positive, show that (a/x) + (b/y) + xy has a minimum at its only 
critical point. What is the situation if a and b are both negative? if they have opposite 
signs? 

3. How many critical points has the function ( ax 2 + by 2 )e~ xZ ~ y2 if b > a > 0? Discuss 
the nature of each of them. 

4. Find the shortest distance from the point (1, -1, 1) to the surface z = xy. Set up 
the squared distance as a function of x, y, find the critical points of the function, and test 
them by Theorem IX. 

5. Discuss the problem of finding the shortest distance from the point (0, 0, a) to the 
surface z = xy, where a > 0. Proceed as directed in Exercise 4. Separate the case 
0 < a ^ 1 and 1 < a. 

6. If z is defined as a function of x, y by the equation 2x 2 + 2y 2 + z 2 + 8xz -z + 8 = 
0, find the points (x, y, z) at which z has a relative extreme, and test by second derivatives 
for a maximum or minimum in each case. 

7. Proceed as directed in Exercise 6, starting from the equation 

x 2 + 2y 2 + 3z 2 - 2xy - 2yz = 2. 

8. Suppose that F is defined in the neighborhood of (a, b), and that Fi(a, b) = 0, 
Fn(a, b) <0. Why is it impossible for F to have a relative minimum at (a, b)? 

9. Locate the critical points of the function xyz(x + y + z - 1). Show that there are 
six lines all of whose points are degenerate critical points, and one nondegenerate critical 
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point. Is this a maximum or a minimum point? Can you answer this last question without 
second derivative tests? 


10. Show that every critical point of the function 


x 3 + y 3 + z 3 
xyz 


is degenerate. 


11. Locate the critical points of the function 

F(x, y, z) = (ax 2 + by 2 + cz 2 )e~’ 2 ~ y2 ~ z2 , 


where a > b > c > 0. Show that there are two points of maximum value of F, one point of 
minimum value, and four critical points at which there is neither a maximum nor a 
minimum. 

12. Study the function F(x, y) = (y 2 - x 2 )(y 2 - 2x 2 ). Show that there are four lines 
which divide the plane into eight regions, in each of which F has a constant sign. Discuss 
the critical point of the function. Are there any relative extrema? 

13. Discuss the sign of the function F(x, y) = (2x 2 - y)(x 2 - y) at various points of the 
plane, by appropriate consideration of the regions into which the plane is divided by the 
two parabolas y = x 2 , y = 2x 2 . Discuss the critical points of the function. Show that, along 
every straight line through the origin, the values of F reach a minimum at (0, 0) but that F 
has neither a maximum nor a minimum at (0, 0). 


MISCELLANEOUS EXERCISES 

1. Generalize Theorem I of §6.3 to functions of n variables. 

2. Suppose /(x, y) is differentiable at (a, b), with A = /i(a, b), B=f 2 (a,b). Let 
F(r, 0) = /(a + r cos 0, b + r sin 0). Then Fi(0, 0) exists and is equal to A cos 0 + B sin 0, 
for every 0. Prove this directly from the definition of differentiability of / and the fact 
that Fi(0, 0) = lim^o (1/r) {F(r, 0) - F(0, 0)}. 

3. If F(x, y) = (1 - x)(l - y)(x + y - 1), a = b = b write Taylor’s series for 
F(a + h, b + k). What do you conclude about the sign of the difference F(a + h,b + k)- 
F(a, b) when h and k are small? 

4. Define /(x, y) = (x 3 - y 3 )/(x 2 + y 2 ) if x 2 +y 2 ^0, and /(0, 0) = 0. If we introduce 
cylindrical co-ordinates (r, 0, z ) in the usual way, the surface z = /(x, y) is represented by 
z = r(cos 3 0 - sin 3 0). Observe that the surface consists of a bundle of half-lines; the 
half-line corresponding to a fixed value of 0 starts at the origin and passes through the 
cylinder x 2 + y 2 = 1 at a point for which z = cos 3 0 - sin 3 0. 

By plotting the curve z = cos 3 0 - sin 3 0 with 0 and z treated as plane rectangular co- 
ordinates, and then rolling up the plane to form a cylinder, one can visualize the surface. 
Do this. Does the surface have a tangent plane at the origin? 

5. Suppose that / and <J> are functions of a single variable, and that each function has 
continuous first and second derivatives. We shall suppose that (f>(a) = c^ 0 and that 

¥■ 0. Let F(x, y, z) = /(x) + /(y) + /(z), G(x, y, z) = <f>(x)<f>(y)<j)(z). Consider the 
extremal problem for F(x, y, z) subject to the constraint G(x, y, z) = c 2 . Show that a 
possible solution of the problem occurs when x = y = z = a, and that the extreme will be a 
relative minimum if 



4>(a) 



and a relative maximum if the inequality is reversed. As instances consider: first, 
f(x) = x 2 , 4>(x) = x, a >0; and second, /(x) = e ~ x 2 , c p(x) = x. 



8 / IMPLICIT -FUNCTION 
THEOREMS 


8 / THE NATURE OF THE PROBLEM OF IMPLICIT FUNCTIONS 

We have already acquired some familiarity with implicit functions, in §§6.1 and 
6.6. Thus far what we have learned about implicit functions has been concerned 
almost entirely with techniques for differentiating such functions in concrete 
special cases (§6.1), or with general formulas for the derivatives of implicit 
functions in terms of functional notation (§6.6). In all this earlier work we have 
taken for granted the existence and differentiability of the implicit functions. In 
this chapter we shall inquire into these matters which have been taken for 
granted. 

Consider then an equation in three variables, 

Fix, y, z) = 0. fB-1) 

In certain situations we say that an equation of this form has a solution 

z = fix, v). (8-2) 

Our present purpose is to examine the following questions: What does it mean to 
say that (8-2) is a solution of (8-1)7 Under what conditions is a solution 
possible ? What can be said about the differentiability of the function f in terms 
of what may be known about the function F? The answers to these and related 
questions will occupy us in this chapter. 

Questions of the same sort can be asked about other implicit-function 
situations. We may have y = /(x) as a solution of F(x, y) = 0, or z = 
/(m, v, w, x, y) as a solution of F(m, u, w, x, y, z) = 0. Or, there may be several 
functions which arise as solutions of a system of several equations. Under 
suitable conditions a set of r equations in n + r variables may determine r of the 
variables as functions of the remaining n variables. 

The nature of the problem is most easily understood, and the explanation of 
the theory is the simplest, in the case of an implicit function arising as a solution 
of a single equation in a small number of variables. We shall discuss the case of 
two variables first, and then the case of three variables. The discussion in this 
section is intended to provide motivation for, and understanding of, the formal 
statements of theorems in later sections. 

Let F be a function of x and y, defined in a certain region of the xy -plane. 
Consider the equation 


F(x, y) = 0. 
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(8-3) 
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This equation expresses a condition which a point (x, y) may or may not satisfy. 
If there are some points which satisfy the condition, the set of all such points 
may be called the locus defined by (8-3). We know that, in many cases, the locus 
is some kind of curve. For instance, if F(x, y) = 4x 2 + 9y 2 - 36, the locus is an 
ellipse. Now let /(x) be a function defined for a certain set of values of x. We 
say that y = /(x) is a solution of (8-3) if all the points (x, /(x)) are part of the 
locus defined by (8-3), that is, if F(x, /(x)) = 0 for all the values of x which are 
involved in the definition of the function /. We assume without any further 
explicit mention that all functions which enter the discussion are single-valued. 

In our work with implicit functions we do not attempt to get a solution of 
(8-3) in the form y = /(x) such that it gives us all the locus defined by (8-3), for 
this may be impossible. Thus in the case of the ellipse 4x 2 + 9y 2 - 36 = 0, part of 
the locus is given by the graph of y = i(36- 4x 2 ) I/2 , and another part of the locus 
by the graph of y = - 3(36 - 4x 2 ) 1/2 . What we do attempt is to start with a 
particular point (x 0 , yo) of the locus defined by (8-3), and then to obtain a 
function /(x), defined in some interval x 0 - a <x <x 0 + a, such that y = /(x) is a 
solution of (8-3), and such that, in a sufficiently restricted neighborhood of 
(x 0 , yo), dll the points for which F(x, y) = 0 are given by y = f(x). This localiza- 
tion of the problem to a neighborhood of a particular point (x 0 , yo) is charac- 
teristic of all the treatment of implicit-function problems in this chapter. 

Fig. 52 shows how, in the case of the equation 
4x 2 + 9y 2 - 36 = 0, localization of attention to a suit- 
able neighborhood of one particular point of the 
locus leads to the determination of a solution y = 
f(x) whose graph comprises all that part of the 
locus which is in the neighborhood. In Fig. 52 
two such localizations are shown, the neighbor- 
hood being rectangular in each case. Observe that, 
if the center of one of the rectangular neighbor- 
hoods is a point (x 0 , yo) on the ellipse, the permis- 
sible size of the rectangle is governed by the consid- 
eration that every line parallel to the y-axis and 
passing through the interior of the rectangle shall 
intersect the ellipse exactly once inside the rectangle. It is always possible to 
choose the rectangle so as to satisfy this condition provided that the point (x 0 , yo) 
is not one of the points (±3,0). These are the points where dF/d y = 0, with 
F(x, y) = 4x 2 + 9y 2 - 36. In our study of functions y = f(x) defined implicitly as 
solutions of F(x, y) = 0 we shall always localize the problem within a neighbor- 
hood of a point at which dFjdy^O. If we wanted to solve for x terms of y 
instead of for y in terms of x, we would impose the requirement dF/dx 0. 

Let us turn now to a consideration of equations in three variables. Suppose that 
F(x, y, z) is defined in a certain region in three-dimensional space. We consider the 
locus defined by the equation 


y 



Fig . 52. 


(»-4) ■ 


F(x, y, z) = 0, 
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and suppose that (x 0 , yo, z 0 ) is a point of this locus. We then confine our attention 
to points near (x 0 , y 0 , z 0 ), and ask the following question: Is it possible to find a 
rectangular box defined by certain inequalities 


\x - x 0 | < a, |y - y 0 | < b, \z - z 0 \ < c 


such that every line parallel to the z-axis and passing through the interior of the 
box intersects the locus defined by (8-4) exactly once inside the box? If so, then 
to each pair (x, y) for which |x-x 0 |<a and |y-yo|<b there corresponds a 
unique z such that |z - z 0 | < c and F(x, y, z) - 0. This defines z as a single-valued 
function of (x, y), say z = /(x, y), and gives a solution of (8-4), that is, 
F(x,y,/(x,y)) = 0. 

Under certain conditions it is possible to choose a 
box of the sort just described. We shall explain the 
plausibility of this statement from a geometrical point of 
view. Suppose that the locus defined by (8-4) is a surface, 
such as an ellipsoid, a hyperboloid, or a cone. Suppose 
that the surface is smooth, and that the tangent plane at 
the point (x 0 , yo, z 0 ) is not parallel to the z-axis. Then we 
expect the part of the surface near (x 0 , yo, z 0 ) to be almost 
like the nearby part of the tangent plane; it should then be 
represented by an equation z = /(x, y), since each line 
parallel to the z-axis may be expected to intersect the 
surface exactly once in the vicinity of (x 0 , yo, Zo), pro- 
vided that x - x 0 and y - y 0 are sufficiently small (see Fig. 

53). 

The condition on F in order that the plane tangent to the surface at 

dF 

(x 0 , yo, z 0 ) shall not be parallel to the z-axis is — ¥■ 0 at the point, for the ratios 


z 



-r-:— — : — — define the direction of the normal to the surface; therefore — = 0 
dx dy dz dz 

means that the normal is perpendicular to the z-axis. 

Our discussions of the implicit-function problem for the two cases F(x, y) = 
0 and F(x, y, z) = 0, with geometrical evidence for the solutions y = /(x) and 
z = /(x, y) in a localized form of the problem, suggest the kind of answers we 
may expect to some of the questions raised at the beginning of the present 
section. We do not yet have any real proofs , however. The proofs must come 
out of the analytical situation, for we do not really know the facts about 
geometry of the curves and surfaces except by an examination of the functions 
and equations. 


8.1 / THE FUNDAMENTAL THEOREM 

The following theorem is concerned with a precise statement bearing on the 
questions raised at the outset in §8. 
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THEOREM I. Let F(x, y, z) be a function defined in an open set S containing the 
point (x 0 , yo, z 0 ). Suppose that F has continuous first partial derivatives in S. 
Furthermore assume that 

F(x 0 , yo, Zo) = 0, F 3 (x 0 , y 0 , 2 o) ^ 0. 

Under these conditions there exists a box-like region defined by certain 
inequalities 

\x - x 0 | < a, |y - y 0 | < b, |z - z 0 | < c, 

lying in the region S and such that the following assertions are true: 

Let R be the rectangular region |x - x 0 | < a, |y - y 0 | < b in the xy -plane. 
Then 


1. For any (x, y) in R there is a unique z such that 

|z-z 0 |<c and F(x, y, z) = 0. 

Let us express this dependence of z on (x, y) by writing 

z = f(x, y) 

2. The function f is continuous in R. 

3. The function f has continuous first partial derivatives given by 

ft v ..s- Fi(x,y,z) f( F 2 (x,y,z) 

F } (x,y,z)'U X ’ y) - Fi(x, y,z) 




where z is given by (8.1-1). 


Proof. The first part of the proof is concerned with determining suitable 
values for the positive constants a, b, c which are mentioned in the theorem. Let 
A be a rectangular parallelpiped (box) with center at (xo, yo, 2 0 ) such that the 
whole of A is entirely in the region S, and such that, moreover, F 3 (x, y, z) has 
everywhere in A the same sign which it has at (x 0 , yo, 2 0 ). This choice of A is 
possible since S is open and F 3 is continuous (see Theorem III, §5.3). 

For definiteness let us assume F 3 > 0 in A. Consider the top and bottom 
faces of the box A. If we denote the height of the box by 2c, these faces will lie 
in the planes z = z 0 ± c. Since F 3 > 0, the value of F increases as we go upward 
along any line parallel to the z-axis. Since F(x 0 , yo, 2o) = 0, it follows that 

F (x 0 , y 0 , z 0 + c) > 0 and F(x 0 , y 0 , z 0 - c ) < 0. 

Because of the continuity of F we see that F will be positive in a small rectangle 
with center at (x 0 , y 0 , z 0 +c) in the plane z = z 0 +c, and negative in a small 
rectangle with center at (xo, yo, 2 0 -c) in the plane z = z 0 — c. Let us choose 
positive numbers a, b, so that these rectangles are determined by the inequalities 

|x — x 0 | < a, |y — y 0 | < b. 

We also take care to choose a and b so that the box B defined by the 
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inequalities 

|x - x 0 | < a, |y - y 0 | <b,\z- z 0 | < c 

is no larger than the box A (see Fig. 54). 

Now consider the value of F along the segment in which a line parallel to 
the z-axis intersects the box B. As we go up along this segment the value of F 
increases. At the lower end of the segment, F < 0, while at the upper end F > 0. 
Hence there is just one point on the segment at which F = 0; for a given pair 
(x, y), the z-co-ordinate of this point is denoted by z = /(x, y) (see Fig. 55). 
Assertion 1 of the theorem is now proved. Observe that thus far we have made 
no use of the partial derivatives Fi, F 2 . 

Having obtained the function /(x, y) by the foregoing argument, let us now 
prove that it is continuous. To prove continuity at (x 0 , yo) we must show that, 
given e > 0, there exists 5 > 0 such that 

I/O, y) — /(x 0 , y 0 )| < € when |x — x 0 | < 8 and | y - y 0 | < 8. (8.1-2) 

We may assume that e ^ c. Now /(xo, yo) = Zo; also, 

F(x o, y 0 , z 0 + e) > 0, F(x 0 , y 0 , z 0 - e) < 0. 

Hence, by the very argument used in proving assertion 1 of the theorem, we see 
that if we choose 5 > 0 so that 

F(x, y, z 0 + e) > 0 and F(x, y, z 0 - 6) <0 

when 

|x - x 0 | < 5 and |y - yo| < 5, $M-3) 

then to each (x, y) satisfying (8.1-3) corresponds a unique z such that |z - z 0 1 < € 
and F(x, y, z) = 0. This z must be equal to /(x, y), however, by 1. Thus (8.1-2) 
holds, and the continuity at (xo, yo) is proved. 

To prove continuity of / at any other point (xi, y0 of R, let Zi = /(x i, yi). 
Observe that F satisfies at (xi, yi, Zi) the same hypotheses as are stated in the 
theorem relative to the point (x 0 , yo, z 0 ). Hence, by what has already been proved 
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(as applied to this new situation), all the points (x, y, z) in the vicinity of 
(xi, yi, zO such that F(x , y, z) = 0 are furnished by a single-valued function (let us 
call it g(x, y)) which is continuous at (xi, yi). However, since all these points are 
in the box B, the uniqueness clause of conclusion 1 of the theorem assures us 
that /(x, y) = g(x, y) when (x, y) is near (x b y t ). Hence / is continuous at (x t , y,). 


It remains only to prove assertion 3 of the theorem. We shall deal with 

a/:. 


§ 1 . 

ax’ 


the treatment of is different only in the letters used. We employ the law of the 
mean (§7.4). Let (x, y) be a point of R, and let z = /(x, y). We wish to show that 

( 8 . 1 - 4 ]^ 

We assume that Ax is so small that (x + Ax, y) is also in R, and write 


f(x + Ax, y)-/(x, y) = Fi(x, y, z) 
™ Ax F 3 (x, y, z) 


Az=/(x + Ax, y)-/(x, y). 

Now, considering F as a function of x and z only, we have by the law of the 


mean, 


F(x + Ax, y, z + Az) - F(x, y, z) = AxF t (X, y, Z) + A zF 3 (X, y, Z), (8. 1-5)* 


where 


X = x + 6 Ax, Z = z + 0 Az, 0 < 0 < 1. 

The left member of (8.1-5) is zero, by the definition of the function /. Hence 


Az _ Fi(X,y,Z) 
Ax F 3 (X, y, Z) m 




N ow Az -> 0 when Ax -» 0, by the continuity of / ; therefore X x and Z z. The 
truth of (8.1-4) is now seen to follow from (8.1-6), because Fi and F 3 are 
continuous. The formula 


/i(x, y) = - 


Fi(x, y,/(x, y)) 
F 3 (x, y,/(x, y)) 


has now been established. From it we see that /i is continuous (see Theorem IV, 
§5.3). This completes the proof. 


8.2 / GENERALIZATION OF THE FUNDAMENTAL THEOREM 

In the theorem of §8.1 we dealt with the existence of a function z=/(x, y) 
defined implicitly by an equation F(x, y, z) = 0. We chose to state the theorem 
for the case of three variables, but nothing essential in the theorem or its proof is 
really dependent upon the particular number three. The analytical details and the 
geometrical language presented in §8.2 may all be modified easily to meet the 
situation of a different number of variables. We shall state the theorem formally 
in the general case (n + 1 variables). The proof will be omitted. The theorem tells 
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us what we can be certain of, under appropriate conditions, in speaking of a 
function 

z = f(x 

defined implicitly by an equation of the form 

F(x u x 2 , z) = 0. 

We use geometrical language in speaking of the regions of definition of the 
above functions. 

< THEOREM II. Let F(x i, . . x n , z) be defined in an (n + 1 )-dimensional neigh- 
borhood of the point (a u . . a n , c). Suppose that F has continuous partial 
derivatives in this neighborhood , and furthermore , assume that 

F(a u . . ., a n , c) = 0 , F n+ i(a u . . a«, c) 7^ 0. 

Under these conditions there exists a box-like region defined by certain 
inequalities 

I*! - ail < A b . . |x„ - a„ | < A n , |z - c\ < C, 

lying in the ab ,ve neighborhood , and such that the following assertions are 
true: 

Let R be the n-dimensional region 

|x, - ail < A u . . \x n - a„| < A„ 
in the space of the variables xi, . . x n . Then 


1. For any (jcj, . . jc„) in R there is a unique z such that 

\z — c\<C and F(xi, . . x n , z) = 0. 

Let us express this dependence of z on ( X\ , . . x„) by writing 

Z = f(X j, . . X n ). 

2. The function f is continuous in R. 

3. The function f has continuous first partial derivatives given by 


JLfty Y ) = ~Fj(X|,....X„,z) . , 

dXi f (x u ...,x n ) Fn+l(Xl) ... )Xn;2 ) 


where z = /(x,, . . x„). 


EXERCISES 

1. It is true that the part of the locus defined by x + y + z - sin xyz = 0 near the 
point (0, 0, 0) can be represented in the form z = f(x, y)? 

2. How can you be sure that the equation e z (x 2 + y 2 + z 2 ) - Vl + z 2 + y = 0 has a 
solution z = f(x, y) which is continuous at x = 1, y = 0, with /(l, 0) = 0? Using the tangent 
plane as an approximation to the surface, calculate /(I + h, k ) approximately when h and 
k are small. 



8.2 


GENERALIZATION OF THE FUNDAMENTAL THEOREM 


229 


3. Can the equation (x 2 + y 2 + z 2 ) 1/2 -cos z = 0 be solved uniquely for y in terms of 
x and z in the neighborhood of the point (0, 1, 0)? Can it be solved uniquely for z in terms 
of x and y in such a neighborhood? 

4. Does there exist a function /(x, y) continuous at (1,-1), with /(1,-1) = 0, and 
such that x 3 + y 3 + [/(x, y)] 3 = 3xy /(x, y) at all points of a neighborhood of (1, -1)? 

5. Show that z 3 + (x 2 + y 2 )z +1 = 0 has a unique solution z = /(x, y) defined for all 
x, y, and that / has continuous first partial derivatives everywhere. 

6. Does the function x(y - l)Vz + x 2 z 3 + sin x satisfy the hypotheses of Theorem I 
in a neighborhood of the point (0, 0, 0)? 

7. For the purposes of this exercise let us make the following definition: A point 
(xo, yo, Zo) of the locus defined by F(x, y, z) = 0 is called a regular point of the locus if the 
part of the locus is a sufficiently small neighborhood of the point can be represented in at 
least one of the three forms z = /(x, y), x = g(y, z), y = h(z, x), where f is defined and 
continuous for all values of (x, y) sufficiently close to (x 0 , yo), f(x 0 , y 0 ) = z 0 , and cor- 
responding requirements are placed on the functions g, h. 

(a) Now suppose that F has continuous first partial derivatives in a neighborhood of 

(x 0 , yo, zo), that F(x 0 , yo, z 0 ) = 0, and that + (jjpj + >0 at the point. Prove 

that (xo, yo, Zo) is a regular point of the locus. 

(b) Is the origin a regular point of the cone x 2 + y 2 - z 2 = 0? 

(c) What is the locus defined by the equation (x 2 + y 2 + z 2 ) 2 - a 2 (x 2 + y 2 + z 2 ) = 0, a+0? 
Are there any non-regular points of the locus? 

(d) If a, b , c are positive, show that all points except (0, 0, 0) of the locus defined by 
(x 2 + y 2 + z 2 ) 2 - a 2 x 2 - b 2 y 2 - c 2 z 2 = 0 are regular. 

8. In particular cases we may be able to arrive at information about z = /(x, y) as a 
solution of F(x, y, z) = 0 by methods quite different from those used to prove Theorem I. 
As an example, consider the equation 

ye z +xz -x 2 - y 2 = 0. 

Suppose that x and y are fixed, with y^ 0. Consider w and z as variables, and draw the 
curve w = e z and the line 

x , x 2 + y 2 
vv = — z + — 

y y 

on the same wz -co-ordinate axes. If the line intersects the curve, the z-co-ordinate of the 
point of intersection depends on the parameters x, y, say z=/(x, y), and this gives a 
solution of F(x, y, z) = 0, where F(x, y, z) = ye z + xz - x 2 - y 2 . 

(a) Plot the graphs of the curve and the line when y ^ 0 and x/y > 0, and show that there 
is a unique intersection. 

(b) Show that if x/y ^ 0 there may be no intersection, one intersection (tangency), or two 
intersections. Show that, if x, y, z are such that the line is tangent to the curve, then 

f=»- 

9. (a) Give an example of an equation F(x, y) = 0, where F is continuous but the 
equation is not satisfied by any points (x, y). 

(b) Give another example in which F(x, y) = 0 is satisfied by a certain point (x 0 , yo), but 
not by any other points near (x 0 , yo). 
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(c) Give an example in which the locus defined by F(x, y,z) = 0 is not a surface but a 
curve or a straight line. 

10. Formulate an exactly worded theorem corresponding to Theorem I, but for the 
case of y = f(x) defined as a solution of F(x, y) = 0, localized near a point (x 0 , yo). Give a 
detailed proof similar to the proof of Theorem I, and supply appropriate diagrams 
analogous to Figs. 54 and 55. 

11. Is the locus of y 2 4- x 2 e y = 0 a curve? 

12. What is the locus defined by 

(e sinx - 1) 2 + (sin y - l) 2 = 0? 

13. Suppose F(x, y) has continuous first partial derivatives in a neighborhood of 
(a, b ), that F(a, b) = 0, and that (~^j + (fy) >0 at ( a > &)• Explain how you know that 

the part of the locus F(x, y) = 0 near (a, b) is a smooth curve without self-intersections. 

14. If F(x, y) = x 2 + y 2 - x 3 , find the solution y = f(x) of F(x, y) = 0: (a) near the 

point (5, 10); (b) near the point (10, -30). (c) Near what points of the locus F(x, y) = 0 

is there no solution y = f(x ) of the type considered in the discussion of Fig. 52? 

15. If F(x, y) = (y - x 2 ) 2 - x 5 , find the solution y = f(x ) of F(x, y) = 0: (a) near the 

point (1,0); (b) near the point (1,2). (c) What is the situation near the point (0, 0)? 

16. What condition on a and b is sufficient to guarantee that the equation 

yx 2 - y 3 + x 3 - a 2 b - a 3 + b 3 = 0 

has a solution y = /(x) where / is continuous at x = a and f(a) = b? 

17. Can the part of the curve y n (x-y)-(x + y) = 0 near the origin be represented in 
the form y = /(x)? Assume that n ^ 1. Draw a straight line segment which represents the 
curve approximately near the origin. 

18. If Xo^O and x 0 ^ 1, show that, if (x, y) is sufficiently near (x 0 , 0), the equation 
sin x 2 y - xy = 0 is equivalent to y = 0. 

19. (a) Find the unique solution y = /(x) of tan y - xy = 0 in the neighborhood of 
(x 0 , 0), if x 0 ^ 1. 

(b) Show that, if (x 0 , yo) satisfies tan y - xy = 0, and if y 0 5* 0 and cos y 0 0, the equation 
tan y - xy = 0 defines uniquely a solution y = /(x) such that / is continuous at x 0 and 
yo = f(x 0 ). 

20. Can the part of the curve xe y - y + 1 = 0 near the point (e~ 2 , 2) be represented in 
the form y = /(*)? Represent it in the form x - g(y) and sketch the curve. 

21. Make the following assumptions: (1) F(x, y) is defined when |x-x 0 |</i and 
|y - yo| < k; (2) F(x 0 , yo) = 0; (3) F(x, y) is a continuous function of x for each y, and a 
continuous function of y for each x; (4) for each fixed x, F(x, y) increases as y increases. 
Deduce that there exists a number c, 0<c ^ h, and a continuous function /(x), defined 
when |x - x 0 | < c, such that |/(x) - y 0 | < k, and such that the graph of y = f(x ) comprises 
all points inside the rectangle |x - x 0 | < c, |y - y 0 | < k at which F(x, y) = 0. 

8.3 / SIMULTANEOUS EQUATIONS 

In this section we shall discuss the problem of implicit functions as it arises in 
connection with simultaneous equations. We have already considered examples 
of the technique of finding the partial derivatives of functions defined implicitly 
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by simultaneous equations (see §§6.1, 6.6). Our present concern is with existence 
theorems analogous to the fundamental theorems I, II given in §§8.1, 8.2. 

Suppose we are given two equations in five variables. Experience with 
algebraic and trigonometric problems of this kind leads us to expect that, in 
certain cases at least, we may solve for two of the variables in terms of the 
remaining three. Thus, for instance, from equations 

F(x, y, z, u, v) = 0, G(x, y , z, u, v) = 0, (8.3-1) 


we may be able to solve for u, v in terms of x, y, z: 

u = /(x, y, z), v = g(x, y, z). (8.3-2) 

Or, again, if we have three equations in five variables, we may perhaps be 
able to express three of the variables in terms of the remaining two. The 
functional-notation expression of this situation would be that the equations 

F(x, y, u, v, w) = 0, G(x, y, u, v, w) = 0, H(x, y, u, v, w) = 0 
give rise to functions 

u = /(x, y), v = g(x, y,), w = h(x, y). 

The general situation suggested by these cases is that in which we are given r 
equations in n + r variables ( n and r positive integers). In certain cases, at least, 
such a set of equations will define r of the variables as functions of the 
remaining n variables. In seeking to understand the implicit-function problem for 
r simultaneous equations, it is easiest to begin with the case r = 2. The total 
number of variables makes very little difference. For definiteness we consider 
the case of two equations in five variables, say in the form (8.3-1). 

The simplest simultaneous-equation systems are the linear systems. Let us, 
as a preliminary special case, suppose that equations (8.3-1) are linear in u and 
v, but not necessarily linear in x, y and z. Then they will have the form 

AjM + BiD + C^O rR T-Tl 

A 2 u + B 2 v + C 2 = 0, ^ V 


where the coefficients A b B u . . ., C 2 all depend on x, y, z. This system of equa- 
tions can be solved for w, v in terms of x, y, z provided the determinant 


does not vanish. Now, regarding (8.3-3) as a particular instance of (8.3-1), we 
see that 

F(x, y, z, m, d) = Aim + Biv + C t , 

and thus 


dF 

dU 


A ^ 
A ” dv 


B,. 


Similar equations hold for G, and thus the determinant D is the same as the 
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Jacobian determinant 

d(F,G) 
d(u, v) 

It will be seen by referring back to §6.6 that this same Jacobian arises in the 
denominators of the expressions for the partial derivatives of u and v as 
functions of x, y, 2 , assuming that such functions are defined by equations 
(8.3-1). 

The foregoing considerations indicate that if we expect to solve equations 
(8.3-1) for u, v, we should make the assumption that the Jacobian (8.3-4) is 
different from zero. It must be kept in mind that in the general (nonlinear) case 
we are concerned not with the actual solution for u, v in the elementary sense of 
expressing u, v in terms of x, y, z by more or less simple formulas, but with the 
solution in the theoretical sense of knowing certainly that there exist functions 
(8.3-2) satisfying equations (8.3-1). A qualified guarantee of the existence of 
solutions in this theoretical sense is furnished by the following theorem: 

* THEOREM III. Let S be a neighborhood of the point P 0 : (x 0 , yo, z 0 , w 0 , Uo) in the 
5-dimensional space of the co-ordinates x, y, z, u, v. Suppose that the 
functions F , G occurring in the system (8.3-1) are continuous and have 
continuous first partial derivatives in S. Also assume that both functions 
vanish at the point P 0 but that the Jacobian (8.3-4) does not vanish at the 
point. Under these conditions there exists a box-like region lying in S, defined 
by certain inequalities 

|x - x 0 j <a,\y- y 0 | < b, |z - z 0 | < c, (8.3-5) 

\u - M 0 | < a, \v - vq\ < j3, (8.3-6) 

such that the following assertions are true: 

Let R be the region defined , in the 3-dimensional space of the co-ordinates 
x, y, z, by the inequalities (8.3-5). Then 

1. To any (x, y, z) in R there corresponds a unique pair of values u, v 
such that the inequalities (8.3-6) are satisfied and the functions, F, G vanish 
( i.e ., equations (8.3-1) are satisfied). This correspondence defines u and v as 
functions of x, y, z, say 

u = f (x, y, z), v = g(x, y, z). 

2. The functions f, g are continuous in R. 

3. The functions f, g have continuous partial derivatives given by 

dl = 1 d(F, G) dg _ 1 d(F,G) 

dx J d(x,v)’dx J d(u,x) 


dF 

dF 

du 

dv 

dG 

dG 

du 

dv 


(8.3-4) 


(8.3-7) 
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and similar formulas with x replaced by y, where 

d(F,G) 


J = 


d(u, V) 


(8.3-8) 


1 Proof. The proof rests very heavily on use of Theorem II, §8.2. To begin 

dF dG 

with, we observe that the two partial derivatives — , -r— cannot both vanish at 

dv dv 

the point P 0 , for if they did, the Jacobian J would vanish there, contrary to the 

dF 

hypothesis. For definiteness assume that — does not vanish at P 0 . We are now 
able to apply Theorem II to the equation F(x, y, z, n, r>) = 0, taking n = 4, 
x\ = x, x 2 = y, x 3 = z, x 4 = u, z = v. 

As a result we obtain a function 

v = <j>(x, y, z, u) 

defined for (x, y, z, u) in a neighborhood of (x 0 , yo, z 0 , n 0 ), and furnishing a 
solution of the equation F(x, y, z, u, v) = 0 for v. Next we substitute in G, writing 

H(x, y, z, u) = G(x, y, z, u, </>(*, y, z, u)). 

The equation G(x, y, z, u, v) = 0 is thus replaced by the equation H(x, y, z, u) = 0. 
We seek to solve this equation for u. As a condition for being able to solve, we 
dH 

need to know that — — 0. Now, by the rule for composite functions, 

dU 

dH _ dG j dG dv 
du du dv du 


But we know from Theorem II that 


dv 

du 


dcf) _ 
du 


Hence 


dH 

du 


dG 

du 


or 


dF 
dG du 
dv dF 
dv 

dH _ 

dU 


dF_ 
du 
dF * 
dv 

dG dF _ dG dF 
dU dv dv du 
dF 
dv 

J 

dF’ 

dv 


dF 


In a small neighborhood of the point Po neither J nor will vanish. We can 
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then apply Theorem II to the equation H(x, y, z, u) = 0 to obtain a solution 
u = f(x, y , z). Finally, substituting in (f> we obtain v as a function of x, y, z: 

V = g(x, y, z) = <j)(x , y, z, f(x , y, z)). 

We shall omit the exact details of the limitation of the magnitudes of the 
differences x - x 0 , y - y 0 , • • • in order to validate all the foregoing arguments. In 
applying Theorem II we are assured that the functions <fi, f have continuous first 
partial derivatives. The function g, as a composite function, will then have 
continuous first partial derivatives also. The formulas (8.3-7) for the partial 
derivatives of f and g have already been obtained (see (6.6-11)). We can appeal to 
this earlier derivation now that we have proved the existence of / and g and the fact 
that they do possess continuous partial derivatives. 

We shall not take space to state formally the analogue of Theorem III for 
systems of more than two equations. The nonvanishing of the appropriate 
Jacobian is the key condition. The proof of the general theorem for r equations 
may be made by mathematical induction on r. The proof for r = 1 is that of 
Theorem II; hence all that is necessary is to make the step from r to r+1. This 
is not difficult, and may be patterned after the proof of Theorem III, which is the 
step from r = 1 to r — 2. Suggestions for this work are contained in Exercises 
10 , 11 . 


EXERCISES 

1. Do there exist functions f(x , y), g(x, y), continuous in a neighborhood of (0, 1), 
such that /( 0, 1) = 1, g(0, 1) = -1, and such that 

[/(*, y)] 3 + xg(x, y)-y =0, 

[g(x, y)] 3 + y/(x, y) - x = 0? 

Explain your answer. 

2. Suppose that the three equations 

w 2 + v 2 + w 2 — x 2 = 0, u 2 + v 2 - y 2 = 0, u 2 + w 2 - z 2 = 0 


are satisfied by a particular set of values (x 0 , yo, Zo, w 0 , n 0 , w 0 ) of the variables, (a) What 
condition on this set of values is sufficient to insure that all “nearby” sets (x, y, z, u, u, w) 
satisfying the three equations are given by equations u = f(x , y, z), v = g(x, y, z), 
w = h(x, y, z), where /, g, h are single-valued and continuous, with values u 0 , v 0 , vv 0 
respectively at (x 0 , y 0 , z 0 )? (b) Solve the given equations explicitly for u 2 , v 2 , w 2 , and from 

the solutions so found explain what happens if the sufficient condition in part (a) is not 
satisfied. 


3. Suppose that (x 0 , yo, z 0f u 0 , v 0 , w 0 ) satisfy the equations 


2 2 , 2 , U 2 , V 2 , W 2 * 

u +v +w = !>? + 7 + F =1 - 


What are sufficient conditions which guarantee that all “nearby” sets satisfying these 
equations can be represented in the form 

u = /(x, y, z, w), v = g(x, y, z, w)? 
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4. Let (x 0 , yo, z 0 , Mo) satisfy the equations 

/W + /(y) + /(z) = F(u) 
g(x) -h g(y) -h g(z) = Gf(u) 
h(x)-f h(y) + h(z) = H(u), 

where all the functions involved have continuous derivatives, (a) State a sufficient 
condition for being able to solve for x, y, z in terms of u in the neighborhood of the given 
point, (b) What does the condition amount to in case f(x ) = x, g(x) = x 2 , h(x) = x 3 ? 

5. The locus defined by the equations x 2 + y 2 + z 2 - r 2 = 0, x+y+z-c=0 may be 
interpreted as the circle of intersection of a plane and a sphere. If (x 0 , yo, z 0 ) is a point of 
this locus, under what sufficient condition on x 0 , yo, Zo may the part of the locus near 
(x 0 , yo, zo) be represented in the form y = /(x), z = g(x)? 

6. Let (x 0 , yo, z 0 ) be a point of the locus defined by z 2 + xy - a = 0, z 2 + x 2 - y 2 - b = 

0. (a) Under what sufficient conditions on x 0 , yo may the part of the locus near (x 0 , yo, z 0 ) 

be represented in the form x = /(z), y = g(z)? (b) What are sufficient conditions on 

x 0 , yo, Zo for representing this part of the locus in the form x = f(y), z = g(y)? 

7. State carefully, in full detail, the analogue of Theorem III for the implicit- 
function problem arising from the equations 

F(x, y, u, v, w) = 0, G(x, y, u, v, w) = 0,H (x, y, u, v, w) = 0, 
where it is desired to find solutions 


w = /(*, y), v = g(x, y), w = h(x, y) 

in the neighborhood of (x 0 , yo, no, t; 0 , w 0 ). For the proof assume that ~ 0 and 

suppose that the equation F = 0, G = 0 are solved for v and w in terms of x, y, u. Let 
K(x , y, u) = H(x, y, u, v, w) when v and w are expressed in terms of x, y, u. Show that 

dK _ d(F, G, H ) / d(F, G) 
du d(u,v,w)/ d(v,w) 

With these suggestions, write out a complete proof. 

8. Suppose that F(x, y, u, v) = x 2 + y 2 - lux + 1, 

G(x, y, u, v) = x 2 + y 2 + 2vy - 1. 

(a) Interpret u and v as parameters, and plot the curves F = 0, G = 0 in the xy-plane, 
assuming w 2 ^ 1. (b) Now suppose x 0 , yo, u 0 , v 0 satisfy the equations F = 0, G = 0, and 

that Mo > L Explain geometrically why it is reasonable to expect that, if u and v differ but 
slightly from u 0 and t> 0 , respectively, the equations F = 0, G = 0 will determine a unique 
point (x, y), if this point is required to be sufficiently near (x 0 , yo). (c) Show that a set of 

values x 0 , yo, u 0 , v 0 cannot satisfy the three equations F = 0, G = 0, ~ = 0 unless 

m 5= 1. Use this result and an appropriate version of the implicit-function theorem for si- 
multaneous equations to give an analytical explanation of the situation described in part (b). 

9. Suppose that (x 0 , yo, Zo, Mo, v 0 , w 0 ) satisfy the equations 


= 1 , 


a + m b + u c+m 

X 2 V 2 z 2 

+ 72 , .. = 1, 


a +v b + v c +v 


+ 


a +w b + w c+w 


= 1 , 
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where 0 < c < b < a and -c 2 < m 0 , -b 2 <v 0 < -c 2 , - a 2 < vv 0 < - b 2 . Show that the equa- 
tions have a unique solution x = f(u, v, w), y = g(u, v , w), z = h(u, v , w), where /, g, h are 
continuous at (m 0 , v 0 , w 0 ) and take on the values x 0 , yo, Zo respectively there, provided that 
x 0 yoZ 0 0. Prove conclusively that the appropriate Jacobian is not zero. 

10. Assuming that — — ^ r - ~ ^ 0, 

d(u u . . U r -\) 

let Mi - (f>i(x ■„ . . x„, Ur) (i = 1, . . r - 1) be solutions of the first r - 1 equations in the 
system 

F,(Xl, . . x„, Ml, . . M r ) = 0 


Fr(Xl,...,X rt , Ml,..., Mr) = 0 
for Mi, . . ., M r -i in terms of xi, . . ., x n , u r . Let 

G(Xl, . . ., X«, Mr) = Fr(Xi, . . ., X n , <j) U . . ., <f) r 1 , M r ), 

where for convenience we have written merely (pi instead of <f> i(xi, . . x„, M r ). Show that 

dG d(Fl, . . ., F r -l) _ d(Fi, . . ., F r ) 

dUr d(Ml, . . ., M r -i) d(Ml, Mr) 

is an identity in Xi, . . ., x n , M r . 

Suggestion: Differentiate the identities 

Fl(Xl, . . ., Xn, (ply . . ., (pr—ly Mr) 0 

F r — l(Xl, . . ., X n , <p\, . . ., <pr- 1, M r ) = 0 
F r (X 1 , . . X„, (ply . . ., <p r —iy U r ) ~ G(Xi, . . ., X„, M r ) = 0 

with respect to u r and regard the resulting equations as a set of r linear equations in 
, ^P as unknowns. Solving this system for ^P by determinants gives the 

d Mr dUr dUr dU r 

required result. 

11. State the theorem analogous to Theorem III for functions u t = cf>i(xi, . . . , x„) 

d(Fi F) 

defined by equations F(xi, . . . , x„, Mi, . . . , u r ) = 0, where — — — — r - f ^ 0. Use the results 

O (Ml, . . . , Mr) 

of Exercise 10 to prove the induction step from r - 1 to r, and so prove the theorem. 



9 / THE INVERSE 
FUNCTION THEOREM 
WITH APPLICATIONS 


9 / INTRODUCTION 

In Chapter 8 we discussed implicit function theory, which deals with problems 
in which we have a system of m simultaneous (not necessarily linear) 
equations in n unknowns, where n > m. The question is, “When can the system 
be solved to express some m of the variables as functions of the other n - m?” 
In §8.1 we gave detailed consideration to the case in which m = 1 and 
n = 3. In §8.2 we extend this to the case of one equation in n + 1 variables, and 
in Theorem III of §8.3 we went as far as showing how the same methods could 
provide an answer for a system of two equations, each having five variables. 
From this point, one could infer what the general theorem must be for systems 
of m equations in n variables. 

In this chapter we switch our attention slightly to what is called inverse 
function theory. The problem here arises when we have a system of n simul- 
taneous equations which transform one ordered n -tuple of numbers into another, 
that is, a system of the following kind: 


f w (x,,X 2 , yi 

f (2 \x u x 2> . . . ,x„)= y 2 •(9-i)'’ 

/ <n) (x,, x 2 , . . . , x„) = y„ 

When we substitute an ordered n -tuple of numbers into the functions on the left, 
the n values which we get form another ordered n -tuple of numbers on the right. 
The system (9-1) transforms the n -tuple (xi, x 2 , . . . , x„) into (y b y 2 , . . . y n ), and 
we shall find it convenient to speak of system (9-1) as a transformation. The 
problem which we wish to consider is, “Given transformation (9-1), when it is 
possible, in theory at least, to solve for the x’s in terms of the y’s?” 

Notice that we already have the answer in the special case where the given 
functions on the left are linear. The transformation then reduces to 

/ (1) (Xi, . . . , X n ) = a n x 1 + 012*2 + * * * + a l nXn = y 1 

f (2 \x 1 , . . . , x n ) = a 2 i*i + a 22 x 2 + • • ■ + 02 nXn = yi *(9-2) 


/ (n) (*i, . . . , x n ) = a n \X\ + 0 * 2 * 2 + * • * + a nn x n = y„ 
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Cramer’s rule tells us the following: If the coefficient determinant 




0 n 

012 

01 n 

A 


a 2 \ 

022 * * 

* * 02n 



0«i 

0n2 ’ * 

‘ ‘ 0«n 

then 

for all 

values 

of the 


*(9-3) 


uniquely determined. Furthermore, the x’s turn out to be linear combinations of 
the y’s, just as in (9-2) the y’s are linear combinations of the x’s. This means 
that the solution has the form 


hnyi + bi 2 y 2 + ‘ * * + bi„y n = *i 
b 2 iyi + b 22 y2+- ’ ‘ + b 2nyn=X 2 


(9-4) 


h n iyi+ h n2 y 2 + • • ■ + b nn y n = x n 


The transformation represented by (9.4) is called the inverse of that represented 
by (9-2). If one is interested in knowing just what values the b’s must have in 
(9-4), they can be found by taking the solution of (9-2) given by Cramer’s rule, 
that is, 





where A m is the determinant obtained from A by replacing the mth column 
of A by the column of y’s on the right in (9-2). Expanding each of the A m by minors 
of the y’s in the mth column gives us (9-4). 

So Cramer’s rule is the best known example of an inverse function theorem, 
and it is the most satisfactory, in the sense that it gives a complete answer to the 
inverse function question, in those cases to which it applies. But unfortunately it 
applies only to those cases of (9-1) in which all the functions are linear, that is, 
in those cases where (9-1) reduces to (9-2). Before leaving this very special case, 
it is important to notice a certain important feature which it exhibits. In (9-2) 

Q f(0 

/ j° = -j— = ciij. Therefore, the coefficient determinant, A, can also be written in 

uJCj 

the following form, 




•• rt" 

s? 

ff ■■ 

•• f? 

fin) 

fr •• 

■■ fi n> 


iff (•) f( 2) f(»h 

which is seen to be ~ - that is, the Jacobian of the f’s with respect 

d(X l; X 2 , . . . ,X„) 

to the x’s. This same Jacobian determinant is going to turn up again in the 
nonlinear case. 
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Mathematicians frequently try to solve problems h\ very general language so 
that their solutions will apply to as large a class of problems as possible. We 
have been discussing the linear version of the inverse function theorem 
in language sufficiently general to include any finite number of variables. We shall 
now illustrate what we have been saying in an extremely simple case, with n = 2. 


Example L Does the transformation 

2w + 3v = x 
u + 2v = y 


have an inverse? If so, find it. 

The coefficient determinant, A, is 


2 3 
1 2 


= 4 — 3= 1^0. 


( 9 - 5 ) 


The fact that an inverse transformation does exist follows immediately from the 
fact that A 0. We easily find the inverse to be 


2x - 3y = u 


- x + 2y = v. 


( 9 - 6 ) 


In this example, we tried to distinguish between two problems — showing that 
an inverse transformation exists and then actually finding it. The reason is that 
when we came to the nonlinear case, we shall frequently find ourselves in the 
position of being able to prove that an inverse transformation exists, but not 
having any idea how to find it. 

Before leaving Example 1, it is worthwhile to remind ourselves what it 
means to say that in equations (9-6) we have solved the given equations (9-5) for 
m and v in terms of x and y. It means that if we substitute in equations (9-5) for 
m and v their values as given by (9-6) the result is an identity. If we make these 
substitutions we get 


2(2x - 3y) + 3(-x + 2y) = x, 

and 

(2x - 3y) + 2(-x + 2y) = y. 
These equations immediately reduce to 


.(9-7) 


and 5 (9-8) 

y = y, 

which is the identity transformation in the xy-plane. In (9-7) we have performed 
the composition of the transformations (9-5) and (9-6) by first performing (9-6) 
on a point (x, y) and then performing (9-5) on the result. Equations (9-8) show 
that we ended up right where we started — with the point (x, y), and therefore 
(9-6) followed by (9-5) is really just the identity transformation. The student will 
now find it easy to show that if he or she starts with a point (m, u) and transforms 
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it by (9-5), and then transforms that result by (9-6) he or she will end up with 
(u, t>) again. So whichever one of the transformations (9-5) and (9-6) we start 
with, if we immediately follow it by applying the other to the output of the first, 
the result of all this is to leave everything exactly as it was — which is why we 
call it the identity transformation. From this we see that (9-5) is the inverse of 
(9-6) for exactly the same reason that (9-6) is the inverse of (9-5). 

It is frequently convenient to introduce a symbol to denote a transformation. 
If we denote by T the transformation defined by equations (9-5), then we can 
write (9-5) simply as T(u, v) = (x, y). The standard notation for the inverse of a 
transformation T is T -1 , so (9-6) becomes T~\x , y) = (w, v ). What we have said 
at greater length about the relation of a transformation to its inverse (for the 
linear case) is 

T _1 [T(m, u)] = (w, n) for all (u, v) 

and 

T[T- l (x,y)] = (x,y) for all (x, y). 

Since T _1 [T(a, b)] = T[T _1 (a, b)] = (a, b) for all ordered pairs (a, b), we say that 

T _1 T = TT~ l = I 

where I is the identity on the set of all ordered pairs of real numbers. 

Much of what we have said in the preceding paragraph, growing out of our 
study of the linear transformation (9-5), is also true of nonlinear trans- 
formations, but more attention to the domains of the transformations is required. 
This is illustrated in one dimension by considering the familiar nonlinear 
function /(x) = 1 + x 2 . This transforms a ny rea l x into u = 1 + x 2 . If we solve for 
x we obtain two solutions: x = g(u) = Vu - 1 and x = h(u) = -Vu - 1. (We are 
considering real numbers only, so we must have u > 1.) Observe that /(g(w)) = 
1 + (u - 1) = u and = u whenever u > 1. It is not the case, however, that 

both g and h are inverses of /. In fact, neither is an inverse of /. We see that f(x) 
is always in the domain of g as well as in the domain of h. But 

g(f(x)) = Vl + x 2 - 1 = V? = |x| 

so that g(f(x)) = x if x >0, but g(/(x)) = -x if x <0. Thus it is not true that 
g(/(x )) = x for all x in the domain of /. Accordingly, g is not an inverse of /, 
even though f(g(u )) = u for all u in the domain of g. The trouble is that f has too 
big a domain to have an inverse. There is a standard way of getting around this 
kind of difficulty. We restrict / to a smaller domain, doing it in such a way that in 
the restricted domain / never takes on the same value at two different points. If 
we define fi(x) = f(x) with restriction x>0, then g(/i(x)) = x for all x in the 
domain of f, and /i(g(w)) = u for all u in the domain of g. Thus / 1 and g are 
inverse to each other. Similarly, if we define f 2 (x) = f(x) when x < 0, then it may 
be verified that f 2 and h are inverse to each other. 

Observe that the range R of / 1 is the domain of g and the range of g is the 
domain D of There are two distinct identity functions here. We shall denote them 
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by I R and I D , respectively. I R is defined by 

Ir(m) = /i(g(w)) = u for all u in R 

and I D is defined by 

I D (x ) = g(/i(x)) = x for all x in D. 

These equations may be written more briefly as I R =fi°g and I D = g ° f \. The 
small circle between the functional symbols denotes the process of composition 
of two functions. 

Another example of the principle of restriction of the domain of a function 
in order to obtain a function that has an inverse is furnished by the definition of 
the inverse sine function. If we start with f(x) = sin x and by restriction define 

7T 

/i(x) = sin x with domain composed of x’s for which |x|^y, then y = /i(x) is 
equivalent to x = sin' 1 y, with |y| ^ 1. Here the inverse function is g(y) = sin 1 y. 

9.1 / THE INVERSE FUNCTION THEOREM IN TWO DIMENSIONS 

We are now ready for the main subject of this chapter, which is the inverse 
function theorem for nonlinear transformations in spaces of dimension two and 
higher. Beginning in two dimensions we can just as well simplify the notation of 
(9.1) to 

f(u,v) = x, g(u,v) = y. (9.1-1) 

We want to know under what conditions we can solve for u and v as functions 
of x and y. When this can be done, we shall have 

u = F(x, y), v = G(x, y). (9.1-2) 

where F and G are functions such that 

/[F(x, y), G(x, y)] = x, g[F(x, y), G(x, y)] = y, (9.1-3) 

at least under certain restrictions on the domains of the functions involved. And 
under these restrictions we shall say that the transformation defined by (9.1-2) is 
the inverse of that defined in (9.1-1). 

We shall deal with our problem by treating it as a special case of the implicit 
function problem that is solved by Theorem III, For this purpose we regard 
equations (9.1-1) as a pair of equations of the form 

(f>(x, y, m, u) = 0, t/^x, y, w, t;) = 0, 

which we look upon as analogues of equations (8.3-1). Here, however, there is 
no z. This difference is not important, for the considerations of §8.3 apply 
equally well if the functions in (8.3-1) do not depend on z. We observe that if 
4>(x, y, u, v ) = /(u, v) - x and i//(x, y, u, v ) = g(«, v) - y, we have 

d(4>, ») _ d(f,g) _j 
d(u,v) d(u,v) 


(9.1-4) 
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as the Jacobian to be considered in our application of Theorem III, § 8.3 to the 
present situation. Because of the difference in notation between our present 
situation and that of §8.3, the analogues of the functions f, g of §8.3 will be the 
functions F, G of (9.1-2). By applying the theorem of §8.3 to our present situation 
we obtain the following theorem. 

^THEOREM L Suppose that the functions f(u,v), g(u, v) are continuous , with 
continuous first partial derivatives , in some neighborhood of (w 0 , v 0 ), and 
suppose that the Jacobian J in (9.1-4) is not zero at (u 0 , v 0 ). Let x 0 — f(u 0 , i>o), 
y 0 = g(w 0 , v 0 ). 

Then, for suitably restricted rectangular neighborhoods R in the xy- 
plane, with center (x 0 , yo), a nd S in the uv-plane, with center ( u 0 , v 0 ) (and with 
the sides of the rectangles parallel to a co-ordinate axis in each case), it can 
be arranged so that JV 0 when (u, v) is in S and that for each (x, y) in R 
there is a unique (u, v) in S for which the equations (9.1-1) are satisfied. In 
this way there are defined two functions F(x, y) and G(x, y) with domain R 
such that equations (9.1-3) are satisfied for every (*, y) in R. Moreover, if 
(x,y) is in R the equations (9.1-2) provide the only (u,v) in S such that 
equations (9.1-1) are satisfied. In addition, F and G are continuous and have 
continuous first partial derivatives in R, given by the formulas 

F, = g ' = ~t F2= “j’ ° 2= j (9 - 1_5) 

where Fj stands for y), g 2 stands for g 2 [F(x, y), G(x, y)] (and likewise 
for the other partial derivatives), and J is evaluated at (u, v) with u and v 
expressed in terms of x and y. 

Finally, the Jacobian j = ^ is related to J by the formula 

j = j- (9.1-6) 

We comment briefly on (9.1-5) and (9.1-6). Equations (9.1-5) are obtained 
from (8.3-7) by working out what the formulas in (8.3-7) become when applied 
to the notations of our present situation. The very important equation (9.1-6) 
results at once when we substitute the values of F u G\, F 2 , G 2 from (9.1-5) into 
the determinant which is j. 

To get from Theorem I the precise situation of a transformation T and its 
inverse T -1 we must restrict the domain of the transformation defined by the 
equations (9.1-1). The theorem tells us there is a one-to-one correspondence 
between a part of S and all of R. The part of S in question (we shall call it E) is 
the set of those points (u, v) in S given by (9.1-2) as the point (x, y) varies over R 
when F and G are the functions defined in Theorem I. If T denotes the 
transformation from F to R defined by (9.1-1) when the domain of the 
transformation is E, the inverse T -1 is the transformation from R to E defined 
by (9.1-2) with R as its domain. 
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We have been talking about transformations and their inverses, but we have 
referred to Theorem I as an inverse function theorem. The occurrence of the 
two words “function” and “transformation” makes this an appropriate place to 
speak about an enormous broadening of the function concept that enables us to 
regard a transformation as a particular kind of function. Heretofore we have 
been speaking about functions as having for domain a set of points on the real 
line or in space of two or more dimensions, and as range a set of numbers. These 
are numerically valued functions. But the general notion of a function, say /, is 
that of a correspondence involving two sets of things, one set called the domain 
of / and the other the range of /, such that to each member P of the domain 
corresponds a well-determined member of the range, denoted by f(P) and called 
the value of f at P. The members of the domain do not have to be numbers or 
points. There is no logical need to restrict the type of objects that compose the 
domain of a function. The same comments apply to the range. A transformation 
such as that defined by (9.1-1) is a function whose domain is a set in the 
wt;-plane and whose range is a set in the xy -plane. If we consider the equations 
(9-1) at the beginning of Chapter 9, we can regard the n functions f a) , 
f (2) , . . . , f (n) as defining a single function whose domain is a certain set of ordered 
n-tuples (jci, . . . , x n ) and whose range is another set of ordered n-tuples 
y i, . . . , y n . Ordered n-tuples can be regarded as points in a space of n-dimen- 
sions. They can also be regarded as vectors in a vector space (see Chapters 10 
and 11). Therefore a transformation such as that defined by (9-1) is a function 
that is defined on a set of points (or vectors) and has points (or vectors) as 
values. We can also have a transformation with domain in a space of n- 
dimensions and range in a space of m -dimensions, where m can be different 
from n. In this chapter we consider mainly the case in which n-m- 2. 

Let us look further at Theorem I. There is nothing in its statement that tells 
us explicitly anything about the range E of the inverse transformation T~ l (as 
defined in a preceding paragraph). The domain R of T" 1 is the interior of a 
rectangle with center (x 0 , yo); therefore R is an open set, all of its points being 
interior points. (Open sets and interior points are defined in §5.1.) However, we 
can infer that E is an open set, also, so that (no, fo) and all the other points of E 
are interior points of E. We can see this as follows by applying what we have 
learned from Theorem I, for Theorem I can be applied to the transformation T -1 
by reversing the roles of the uu-plane and the xy-plane. We know from (9.1-6) 
that the Jacobian j of (9.1-2) is not zero when (x, y) is in R. Hence, if we start 
with any point (x 1? yi) in R and the point (iii, i?0 of E into which T -1 transforms 
(xi, yO, the application of Theorem I tells us that there is a rectangular neigh- 
borhood Si of (mi, v { ) and a rectangular neighborhood R { of (x b yO (contained in 
R) such that to each point (n, v) in Si there is a unique point (x, y) in R i such 
that T _1 transforms (x, y) into (m, v). But, from what we already know, (u, v) 
must be in E. Therefore Si is in E, and so (xi, y0 is an interior point of E. This 
shows that E is an open set. 

The set E is also connected in the sense of §7.4. We shall not prove this 
here. It is a consequence of the connectedness of R and the continuity of T~ l . 
See Miscellaneous Exercise 3 at the end of this chapter. 
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We illustrate the discussion of T, T~\ and E in Fig. 56. 

In the following Example 2 we have a situation in which it is possible, 
because of the comparative simplicity of the transformations, to show precisely 
the appearance of the set E into which T" 1 transforms R. 

Example 2. A transformation which we may call T is given by the equations 
X = i(u 2 +v 2 ), y = j(« 2 — 1> 2 ). (9.1-7) 

If we try to solve for u and v we are led immediately to 

u 2 = x + y and v 2 = x - y, 

which shows that as long as we consider T as being defined for all values of u 
and v, the equations (9. 1-7) simply do not determine u and v uniquely for given 
values of x and y. For each pair ( x, y) we h ave t o cons ider t he fol lowing set of 
p ossibil ities for ( u , v): (V x + y , Vx-y), (Vx + y, -Vx-y), (-Vx + y, 
Vx-y), and (-Vx + y, -Vx-y). Actually, none of these possibilities exists 
unless the two radicands are both nonnegative. This means that unless both 
x + y > 0 and x - y > 0, there is no pair ( u , v) which T transforms into (x, y). It is 
also obvious from the first equation of (9.1-7) that x >0. The transformation T 
transforms all points in the («, u)-plane into points in the (x, y)-plane satisfying 
the three conditions y > — x, y < x and x > 0. The points satisfying these 
conditions are those belonging to the wedge-shaped region W in the right half 
plane with vertex at the origin, upper edge lying along y = x and lower edge 
along y = - x. (See Fig. 57a.) If (x, y) is an interior point of W, then all four 
possibilities for (m, v) are realized — one in each quadrant of the (m, u)-plane. 
Each point on the boundary of W is the image of two points in the (w, v) plane, 
except for the vertex which is the image of only u = 0, v = 0. 

Suppose now that we restrict the domain of T to those points (m, v ) lying 
interior to the first quadrant. Denote this restricted domain by Qj and let T x 
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denote the restriction of T to Qj ; T\ transforms Qi into the interior of the wedge 
W in a one-to-one manner, so T j do es have an inver se, namely the trans- 
formation TV defined by n = Vx + y and u=Vx-y, where (x, y) is any 
interior point of W. Thus Tr'[T}(w, u)] = (m, v) for all ( u , v) in Q\ and 
Ti[Tf\x, y)] = (x, y) for all (x, y) interior to W. 

We can make other restrictions T 2 , T 3 , T 4 of T by limiting (w, u) to the 
interior of the second, third, and fourth quadrants, respectively. Each of these 


V 



Fig. 57b. 
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rest riction s has an in verse. For example, the inverse TV is defined by w = 
- Vx + y , i> = Vx - y, where (x, y) is in the interior of W. 

Let us examine further the nature of the restricted transformation T i and its 
inverse TJ 1 . The Jacobian in this case is found to be J = -2uv, so J = 0 when 
u = 0 or v = 0. Hence JV 0 in the interior of each of the quadrants of the 
un-plane. For the study of Tj we consider just the first quadrant. Let us choose 
the point u 0 = 4, n 0 = 2. Then x 0 = 10, y 0 = 6, so that Ti(4, 2) = (10, 6).^Theorem I 
speaks about rectangles R in the xy -plane and S in the uv -plane, centered at 
(x 0 , y 0 ) and (u 0 , n 0 ), respectively. The theorem tells us nothing about the sizes of 
R and S, but with our explicit formulas for T\ and Tj 1 let us see what we can 
find out about possible choices of R and S. Since R cannot extend outside of W, 
and since its center is at (10, 6), the greatest possible expansion of R would be 
obtained by placing its upper left hand corner on the line y = x that forms part of 
the boundary of W. If this R is made narrow and tall, that corner will approach 
the point (10, 10); if R is made wide, with small height, that corner will approach 
the point (6, 6). In any case, the corners of R will lie on the lines x - y = 0, 
x + y = 20, x - y = 8, and x + y = 12. See Fig. 57 a, in which R is represented in 
the special case when it is a square. It is inscribed in another square A whose 
sides lie on the foregoing four lines. We consider A as the open set forming the 
interior of the square. 

We shall investigate what happens to R and A when they are mapped into 
the uv -plane by the transformation TV- We shall see that A is mapped into a 
rectangle B and that R is mapped into a certain set within B. If R is any 
rectangle with center (10,6) inscribed in A (not merely the special case when R 
is square), a suitable choice for S can be found, as we shall describe (see Fig. 
57b). 

Consider the points in A on any line x + y = a, where 12 < a < 20. The_value 
of u i n (u, u) = TV(x, y) on any such line segment is always equal to Va, but 
u = Vx — y. Now the lines x — y = b running through A are those for which 
0 < b < 8 (see Fig. 57a), and so 0 < v < V8. As (x, y) takes on all positions in A, 
(m, d) takes on all positions inside the rectangle B whose sides are the lines 
u = Vl2, u =V 20, v = 0, v = V8, and the correspondence is one to one. Thus 
TV maps A onto all of the open set which is the interior of the rectangle B. 

Because R is a part of A it follows that the image E = Ti\R) is a subset of 
B. Hence we can take for S the interior of any rectangle centered at (4, 2), with 
each side parallel to either the w-axis or u-axis, provided that S lies in the first 
quadrant and contains B. The rectangle with sides u = 3, u = 5, u=0, v = 4 
would do. 

We can determine the set E as follows, by considering the families of circles 

u 2 + v 2 = 2x 


and hyperbolas 


u 2 - v 2 = 2y. 


To be specific, as x varies from 6 to 14, each of the lines on which x is constant, 
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with 6 < x < 14, intersects A in a segment that is carried (by TV) into an arc of a 
circle in the first quadrant of the uv -plane. The arc is that part of the circle in the 
interior of the rectangle_B. When 6 < x < 10, the circle cuts the two adjacent 
sides £_=0 and u = Vl2 of B; when 10<x<14, the circle cuts the sides 
u = V 20 and v = V8. See Figures 57a and 57 b. Similarly, as y varies from 2 to 
10, each of the lines on which y is constant, with 2<y< 10, intersects A in a 
segment that is transformed by Tf 1 into an arc of hyperbola cutting through the 
interior of B. When 2 < y < 6, the hyperbolic arc intersects the sides u = V 12 and 
t; = V8_of B, and when 6<y<10, the arc intersects the sides v = 0 and 
w = V 20. Now, to find the part of B that corresponds to R (we denote it by E), 
we must locate in B just those portions of the circular arcs that correspond 
to the segments of the lines x = constant that cut through the interior of R. It is 
clear that these portions of the circular arcs will be cut off between the two 
hyperbolic arcs that correspond to the lines y = constant which form the top and 
bottom of R. In this way we see that the set E is the interior of a curvilinear 
quadrilateral lying in B and bounded by two circular arcs and two hyperbolic 
arcs. Figure 57 b shows the case of the E corresponding to R when R is a 
square. Incidentally, one sees that the part of B not in E or on its boundary 
consists of four separate pieces, each of which is the image of a triangular piece 
of A exterior to R. 

EXERCISE 

In the equations x = f(u , v), y = g(u, u) regard u, v as first-class variables and x, y as 
second-class variables, the relation between the classes being that expressed by (9.1-1). 
From this point of view the chain rule gives 

j _ df dll df dv 
du dx dv dx’ 

where means — and means Carry on with the chain rule in this manner to 
dx dX dX dX 

obtain three other equations, and then solve them, thus obtaining equations (9.1-5). 

9.2 / MAPPINGS 

Suppose that 

X = /(«, v), y = g(u, v) (9.2-1) 

is some transformation and 

u = F(x, y), v = G(x, y) (9.2-2) 

is its inverse. In the previous section we have interpreted (x, y) as rectangular 
co-ordinates in one plane, and (a, v ) as rectangular co-ordinates in another plane. 
We sometimes find it convenient to say that the point (w, t>) is mapped into the 
point (x, y). Equations (9.2-1) are said to define a mapping , or a point trans- 
formation. If the functions /, g are defined in a certain region of the wu-plane, we 
say that this region is mapped into the xy -plane. The configuration in the second 
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plane is called the image of the configuration in the first plane, and that in the 
first plane is sometimes called the pre-image of the one in the second. 

The concept of a mapping does not require that the transformation have an 
inverse. Nor is it necessary, for the mere concept of a mapping, to deal with 
continuous or differentiable transformations. However, we shall study con- 
tinuously differentiable transformations for which the Jacobian is in general 
nonzero. The inverse transformation (9.2-2) then defines a mapping of a portion 
of the xy -plane into the uu-plane. 

It is of interest, in studying mappings, to consider what becomes of a 
configuration of points in one plane, such as a curve or a region, when it is 
mapped into the other plane. When the mapping is continuous, with continuous 
inverse, the mapping process may be conceived intuitively as a deformation, with 
stretching and shrinking of varying amounts, but no tearing or puncturing. If the 
Jacobian of the mapping is zero at a certain point, however, there may fail to be an 
inverse mapping. This can occur, for example, because several points in one plane 
are mapped into the same point in the other plane. 

Again, we emphasize that even if there are points where the Jacobian 
vanishes, this does not necessarily mean that no inverse exists in any neighbor- 
hood of these points. The following extremely simple example helps to guard 
against this unwarranted assumption. If x = u 3 and y = v, then the Jacobian of 
the transformation is 3 m 2 , which vanishes at all points where n = 0, that is, at all 
points of the u-axis. Nevertheless the transformation maps the entire uu-plane 
onto the entire xy -plane in a one-to-one manner and therefore has an inverse — 
n = ^x, v = y which is valid everywhere — we do not even have to restrict the 
domain of the original transformation. 

The following example guards against an error of the opposite sort. It shows 
that even if the Jacobian is different from zero at every point, we may still have 
to restrict the domain of the transformation considerably in order to get an 
inverse. 

Example 1. Consider the mapping 

u — e x cos y, v = e x sin y, (9.2-3) 

of the xy-plane into the nu-plane. We shall investigate the nature of the mapping 
by finding out what happens to lines x = constant or y = constant. 

We observe that 

u 2 + v 2 = e 2x , v cos y = u sin y. (9.2-4) 

Thus the image of any point on the line x = Xo is a point on the circle in the 
Mu-plane, with center at the origin and radius e*°. This circle is defined 
parametrically by equations (9.2-3) with x = x 0 and y as parameter. The point 
(n, v ) goes counterclockwise once around the circle each time that y increases 
by 2tt. The image of a point on the line y = y 0 lies on a straight line through the 
origin in the uu-plane, with slope tan y 0 . The image of the entire line y = y 0 is just 
one of the two rays composing the line v cos y = u sin y, however. For instance, 
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if yo= w/3, the point (u, v) corresponding to (x, y 0 ) is given by 

lx V3 j 

u = ie x , v = -y e , 

so that u and u are always positive. The mapping of the line y = 7t/3 onto the ray 
is indicated in Fig. 58, the part of the ray corresponding to x <0 is shown by a 
dotted line. 



Fig. 58. 



If we now consider the strip in the xy -plane between the lines y = 0, y = 2tt , 
we see that the image of the strip is the entire mu - plane with the exception of the 
origin. Line segments x = x 0 crossing the strip map into circles centered at the 
origin, of radius less than one if x 0 < 0, and of radius greater than one if x 0 > 0. 
Lines y = y 0 map into rays, the angle between the ray and the positive n-axis 
being y 0 . The origin in the mu -plane is not obtained as the image of any point in 
the xy-plane, but (m, u)h>(0, 0) as x -»-<». The nature of the mapping is sug- 
gested by Fig. 59 a and Fig. 59 b in which certain corresponding areas are 
indicated by similar shading. 



Fig. 59a. 


Fig. 59b . 
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The equations (9.2-3) admit the period 2i t for the variable y. Hence any 
other strip of width 277 parallel to the x-axis is mapped onto the uv -plane in a 
manner similar to that just described. 

The Jacobian of the mapping is 


d(u , u) __ e x cos y - e x sin y 
d(x , y) e x sin y e x cos y 


= e 


2x 


It does not vanish for any value of x. The nonvanishing of the Jacobian means 
that if we fix our attention on any point (x 0 , yo) and its image (u 0 , uo), the mapping 
sets up a one-to-one correspondence between the points of a sufficiently small 
neighborhood of (x 0 , yo) and those of some neighborhood of (w 0 , v G ). This is 
described by saying that the mapping is locally one-to-one, or one-to-one in the 
small. The xy -plane as a whole is not mapped in a one-to-one manner on the 
wt>-plane, however, for many points in the xy -plane have the same image in the 
Mu-plane (two such points are at least 27 t units apart, however). Hence we say 
that the mapping is not one-to-one in the large . 

Example 2. Consider the mapping 

u = x 2 ~ y 2 , v = 2xy. (9.2-5) 

In this example let us investigate the configurations in the xy -plane which 
map into lines u = constant or v = constant in the uv- plane. This is equivalent to 
regarding u and v as curvilinear co-ordinates and studying the u -curves and 
v -curves. (For a general discussion of curvilinear co-ordinates see §9.5. A w-curve is 
a curve in the xy-plane on which u is a constant and v is a parameter.) We see 
readily that the u -curves are rectangular hyperbolas, with foci on the x-axis if u > 0, 
and on the y-axis if u < 0. The v -curves are rectangular hyperbolas having the x and 
y axes as asymptotes. For u = 0 we have the pair of lines y = ±x, and for v = 0 the 
pair of lines x = 0, y = 0. 

The image of the hyperbola m 0 = x 2 - y 2 is the line u = u 0 . Note, however, 
that each branch of the hyperbola maps onto the entire line. Likewise, each 
branch of the hperbola v 0 — 2xy maps onto the entire line v = v 0 (see Fig. 60a, 
Fig. 60b). 
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Fig. 60a. 
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Fig. 61b . 


We now see that certain cells with curved sides in the xy -plane are mapped 
into rectangular cells in the au-plane (see Fig. 61a, 61 b). 

The Jacobian of the mapping (9.2-5) is 


d(u, d) 

d(x, y) 


= 4(x 2 +y 2 ). 


Hence the mapping is locally one-to-one except in the neighborhood of the origin 
in the xy-plane. 


The notion of a point transformation, or mapping, is not limited to two- 
dimensional problems. We may, for example, speak of mappings from the 
xyz-space into uvw- space. 

Observe that in the case of mappings from (x, y) to (w, v) which are locally 
one-to-one, we may interpret u and v as curvilinear co-ordinates in the xy-plane, 
or we may interpret x and y as curvilinear co-ordinates in the uv -plane. Similar 
remarks apply when there are more variables in each set. 

The sign of the Jacobian has a significance which deserves mention. Con- 
sider a mapping with Jacobian not zero at (x 0 , yo), so that the mapping is, at least 
locally, one-to-one. If C is a small closed curve enclosing the point (x 0 , yo) in the 
xy-plane, the image of C in the uv -plane will be a small closed curve C' 
enclosing the point ( u 0 , t> 0 ) which corresponds to (x 0 , yo). If the point (x, y) goes 
around C in the counterclockwise sense, the image point (u, v) will go around C'. 
But will (a, u) go counterclockwise also? The answer depends on the sign of the 
Jacobian of the mapping. If J > 0, (w, v) will go around C' in the same sense that 
(x, y) goes around C; but if J <0, (a, v) will go in the sense opposite to that of 
(x, y). We shall not prove this fact just now. See Exercise 7 and §15.32. 


EXERCISES 

1. Consider the mapping x = au, y = bv, where a >0, b >0. Find out what region in 
the ui>-plane corresponds to the region in the xy-plane bounded by the ellipse (x 2 /a 2 ) + 
(y 2 /b 2 )= 1. 

2. Let R be the region in the xy-plane bounded by the lines x — y = 0, x + y = 0, 
x -2y = 2. Find the region R' in the uv -plane onto which R is mapped by the equations 
u = 2x - y, v = x - 2y. 
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3. Find the image in the mu - plane of the triangular region bounded by x = 0, y = 0, 
x + y = 1, if m = x + y, u = x - y. 

4. Find the image in the xy-plane of the rectangle in the nu-plane bounded by n = 1, 

a a 1* t + u . /m + u \ 172 (v - u \ 1/2 

u - 4, v = 9, v = 16, if the mapping is jc - I — - — J , y = I — - — \ . 

5. Consider the mapping u = \(x - y), v = Vxy. Draw a number of the u -curves and 
u -curves in the xy -plane. What regions in the xy -plane map into the rectangle in the 
mu - plane bounded by u = - §, u — §, u = 2, v = 4? 

6. Study the mapping x = u - mu, y = mu. Find the inverse transformation. Identify 
the m- curves and u-curves in the xy-plane, and the x-curves and y-curves in the nu-plane. 

7. Let R be the region in the xy-plane bounded by the lines y — x = 2, y — x = 6, 
y + x = 4, y + x = 8. Let R' be the region in the mu -plane onto which R is mapped by the 
transformation n = i(x + y), v =\(x- y). Examine the direction in which a point goes 
around the boundary of R ' when its correspondent goes around the boundary of R 
counterclockwise. Compare with the sign of the Jacobian and note the remark at the end 
of §9.2. 

8. A mapping from the xy-plane into the mu -plane is defined by 


m = 


1 -x - y’ 


y_ 


1 - x - y ' 


(a) Find ^ U ’ (b) Find the inverse transformation and (c) What are the 

d(x,y) d(u, v) 

m -curves and u-curves in the xy-plane? (d) What are the x-curves and y-curves in the 

mu - plane? (e) Find the region R in the xy-plane corresponding to the square in the 

Mu-plane bounded by the lines m = — 5, m = — 1, u = - 1, u = — §. 

9. A mapping is defined by 


m = 


1 -x 2 — y 2 ’ 


u = 


l-x 2 -y 2 ' 


(a) What are the u-curves in the xy-plane if m> 0? if — 1 <m< 0? if u <— 1? 

(b) Answer similar questions for the u-curves. (c) Find where = 0- (d) Draw 

the curves m=|, m = 3, v=i, v = l and mark four regions in the xy-plane which 
correspond to the rectangle bounded by the lines u = 3, etc., in the mu -plane, (e) Follow 
directions similar to those in (d) for u = u — - 2, u = - 1, v = - 5. 


9.3 / SUCCESSIVE MAPPINGS 

Suppose we have a transformation mapping (x, y) into (u> v ), and a further 
transformation mapping ( u , v ) into (f, rj). The effect of performing these two 
transformations in succession is to give a mapping from the xy-plane into the 
£77 -plane. 

Example 1. Suppose 


u=x + y, v = x y, 
£ = uv, T) = u + v. 


and 
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Then 


| = x 2 -y 2 , tj=2x. 


The single mapping which is produced by carrying out two successive 
transformations is called the resultant , or product , of the two transformations. 
In the above example, we have 

HjL v) = _ ? Ml, rj) = 

H*, y) ’ Hu, v ) 


Ml, v) 

Hx, y) 


- 4y, 


as the student should verify for himself. Observe that 

v - u - - 2 y. 


Hence 


Mli *?) d (u, 

d(u , u) 3 (jc, y) 


= (t; - u)(-2) = 4y 


Ml, V) 
d(x, y)’ 


This illustrates a general truth which we now state formally. 


THEOREM II. Let T x denote a transformation from the xy -plane into the 
uv-plane , and let T 2 denote a transformation from the uv-plane into the 
-plane. Let the resultant transformation from the xy -plane into the |tj- 
plane be denoted by T 3 . Then the Jacobian of T 3 is the product of the 
Jacobians of T } and T 2 , that is, 

Mli T l) = Ml, v) d(u, v ) /g, n 

d(x, y) d(u,v)d(x,y)‘ 


It is assumed that the transformations T x and T 2 are continuously differenti- 
able. It is further assumed that the transformation T 2 is defined for points (u, v ) 
obtained by application of the transformation T x to points ( x , y) in some region R 
of the xy -plane. 


Proof of the Theorem. We regard 17 as functions of the first-class variables 
u, v , which are in turn functions of the second-class variables x, y. Thus 

dx du dx dv dx ’ 

M = Ml M + M M. 

dy du dy dv dy ’ 


dr) 

with similar equations for 


— . It is then a matter of straightforward 
^y 
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multiplication to verify that 

d£ dr) _ dji _ / d£ dyj d£ dir) \f dli dv du dtA 

dx dy dy dx Vdu dv dv du)\dx dy dy dx)' 

This is exactly the relation (9.3-1). The result is recognized with less effort if one 
is familiar with the rule for writing the product of two determinants as another 
determinant. This is of especial importance in dealing with the generalization of 
the theorem for transformations in three or more variables. 

EXERCISES 

1. Use Theorem II to prove (9.1-6). 

2. Verify the correctness of Theorem II when the transformations T i and T 2 are 
defined by u = e x cos y, v = e x sin y and £ = n 2 + v 2 , rj = vfu , respectively. Begin by 
finding the equations of the transformation T 3 . 

3. Let an be the term in the ith row and jth column of a determinant of nth order 
whose value is A. Let by and B refer in similar manner to a second determinant. Let C be 

the value of the determinant with elements Cy, where Cy = a * kby- It is a theorem 

from algebra that C = AB. With this fact in mind, consider the generalization of Theorem 
II for transformations on n variables. Let T x be a transformation in which u u . . . ,u n are 
differentiable functions of x u . . . , x n , and let T 2 be a transformation in which £i, . . . , £„ are 
differentiable functions of u u ...,u n - Let T 3 be the resultant transformation, in which 
£i are considered as functions of Xi, . . . , x n . Use the chain rule and the theorem 
T 2 . 

4. Consider the transformation (9.2-1) and its inverse (9.2-2), in the neighborhood of 

dF 

a point where — 0. Then u = F(x, y) has a solution x = (f>(u , y). Let G(<f>(u, y), y) = 

dx 

<1 >(m, y). Now consider an rs-plane. Show that the mapping (9.2-2) is the resultant of the 
mapping r = F(x, y), s = y from the xy-plane to the rs-plane, and the mapping u = r, 
v = d>(r, s) from the rs-plane to the uv -plane. The Jacobian s of these last two trans- 
dF dd> 

formations are — — and — — respectively. Verify in the case of each of these mappings the 

aX oS 

correctness of the statement made at the end of §9.2 about the significance of the sign of 
the Jacobian. This may be done by a direct inspection, since the mappings are of such 
simple types. Then use Theorem II to prove the correctness of the statement in question 
for the mapping from the xy -plane to the uv -plane. A similar procedure may be used 

when 0. 

ay 

5. (a) Suppose n > p. Let Ui,...,u n be differentiable functions of Xi, . . . , x„, with 
M p+ ], . . . , u n the simple functions defined by u t = x f , i = p + 1, . . . , n. Show that 

d(Ui, . . . , Un) _ d(u 1 , ■ ■ . , Up) 

a(xi, ...,x„) a(xi, . . . , x p )* 

(b) If «i = Xi, . . . , u p - Xp, and the rest of the n’s are differentiable functions of Xi, . . . , x„, 
show that 

a(tll, . . . , Un) _ a(Mp + l, . . . , Un) 

a(xi, . . . , x n ) a(x p+ i, . . . , x„y 
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6. Suppose Fj, . . . , F n are continuously differentiable functions of Xi, . . . , x„, and 


that 


d(Fi,...,F p ) 


¥■ 0, where p is a fixed integer, 1 ^ p < n. Suppose the equations 


a(xi, ...,x p ) 

Mi = Fi(xi, . . . , x n ), . . . , u p = F p (xi, . . . ,x n ) are solved for Xi, . . . ,x p in terms of Ui, . . . , u P 
and x p +i, . . . , x„, and that these values of Xi, . . . , x p are then substituted into F p+ i, . . . , F„, 
giving rise to functions . . . , u p , x p+ i, . . . , x„), i = p + 1, . . . , n. Show that 


d(Fi, . . . , F„) _ a (Ft, . . . , Fp) a(i lf p+1 , . . . , I (Jn) 

a(xi, . . . ,x„) a(xi, ...,x p ) a(x p+ i, . . . ,x„)’ 

where it is assumed that u u . . . , u p are replaced by 


Fi(xi, . . . , x n ), . . . Fp(xi, . . . , x„) 

after calculating the derivatives in the last Jacobian. 

Suggestions of method: Note first of all that if/i(F ,, . . . , F p , x p+ i, . . . , x n ) = 
Fi(xi, . . . , x n ), i = p + 1, . . . , n. Let T { and T 2 be defined as follows: 


T, 

4i = Fi(xi,...,x„) 

£p = Fp(Xl, . . . , Xn) 
£p + l X p + i 


T 2 

Mi = 


Up — £ p 

Mp+l ~ l^ p + l(^l, • . • , £«) 


£n — X n U n > £«)• 

Now apply Theorem II (for the case of n variables) and Exercise 5. 


9.4 / TRANSFORMATIONS OF CO-ORDINATES 

The formulas connecting the rectangular co-ordinates (x, y) of a point and the 
rectangular co-ordinates (x', y') of the same point in a rotated system (see Fig. 
62) are 

x = x' cos <f) — y' sin </>, 
y = x' sin <f> + y' cos <£. 

The transformation (9.4-1) has the inverse 

x' = x cos <f) + y sin <f>, 
y' = - x sin <f> + y cos </>, 

as the student should verify for himself. The student should also verify that 



3(x, y) = d(x r , y') = 1 
d(x',y') d(x, y) 


(9.4-3) 


The familiar formulas connecting rectangular and polar co-ordinates in the 
plane are 


x = r cos 0, 


y = r sin 0 . 


(9.4-4) 
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The Jacobian of this transformation is 

d(x , y) _ cos 6 — r sin 0 _ 

d(r, 6) sin 6 r cos 0 

For convenience we shall assume that r ^ 0 in our dealings with 
ordinates. If we attempt to solve for r, 0 in terms of x , y, we find 

r = (x 2 +y 2 ) 1/2 , sin 6 = (P + 'y ' 2 ) T7 ~ 2 ’ Cosfl = ( x 2 + C y 2 ) lft ' 

To get this far we assume that 0, that is, that the Jacobian is not zero. Note 
that a unique formula for 0 cannot be found, because 0 may be changed by any 
multiple of 2tt without affecting (9.4-6). This non-uniqueness does not violate 
the statement of Theorem I, §9.1, however. (Why not?) 

It is worth while to consider briefly the question of an explicit formula for 0 
in terms of x and y. From (9.4-6) one is tempted to write 

e = tan 'g). 

There are many things wrong with this formula, however, if one is seeking the 
transformation inverse to (9.4-4). If we follow the usual custom of restricting the 
inverse tangent to its principal value, we limit ourselves to the range -ttI2 <6 < 
7 r/2. Any other convention about principal values leads to a similar difficulty. On 
the other hand, if we ignore principal values, and regard the inverse tangent as a 
multiple-valued function, we get into trouble, because the values we then get for 
0 will not all satisfy (9.4-6). Thus, if x = - 1, y = 1, the values of 0 such that 
tan 0 = - 1 are either in the second or fourth quadrant. The second-quadrant 
values satisfy (9.4-6), while the fourth-quadrant values do not. Still another 
difficulty is that tan _1 (y/x) is discontinuous at points on the y-axis, whereas the 
transformation (9.4-4) has a continuous inverse in the neighborhood of such a 
point (if the point is not the origin). 

The inverse tangent (or the inverse cotangent) can be used if care is taken, 
of course. Thus, if r 0 = V2, 0 O = Itt, x 0 = - 1 , y 0 = 1 , the transformation inverse to 
(9.4-4) can be written 

r = (x 2 + y 2 ) 1/2 , 0 = tan -1 ^-^ + 7r 

as long as x <0; here the principal value of the inverse tangent is to be used. 

Let us now examine the nature of a rectangular or polar co-ordinate system 
from a point of view which will be fruitful in our systematic study of trans- 
formations. We consider the xy-plane with a fixed set of axes. A point (x 0 , yo) in 
the plane is located by the statement that is at the intersection of the two lines 
x = x 0 , y = yo- A co-ordinate system is basically a method of locating points by 
setting up a correspondence between points and sets of numbers; the numbers 
corresponding to a point are called co-ordinates of the point. For an arbitrary 
rectangular co-ordinate system we need two one-parameter families of parallel 


(9.4-5) 
polar co- 

(9.4-6) 
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lines, the lines of one family intersecting the lines of the other family at right 
angles. Let the equations of the two families be 

aix + b iy = u 

a 2 x + b 2 y = v , (9.4-7) 

where the a’s and b’s are fixed, and u and v are parameters of the families. For 
the required perpendicularity we must have a\ = - a 2 :b 2> or 


aia 2 + b\b 2 - 0. 

If we now pick out values u 0 , d 0 , the two lines u = u 0 , v 
point (xo, yo) (see Fig. 63). This point is determined by 
( u 0 , d 0 ), and we may take these latter numbers as 
co-ordinates of the point in a new co-ordinate system. 
The xy-system and the uD-system are then related by 
the transformation (9.4-7). The uv -system will not in 
general be obtained from the xy-system by a rigid 
motion, for the units of distance along the u-axis and 
the D-axis will in general be different from each other 
and from the common unit of distance along the x and y 
axes. 

If we drop the requirement (9.4-8), but insist that 
the two families in (9.4-7) not be identical, we get, in 
general, an oblique co-ordinate system. 

The system of polar co-ordinates is also based on 
locating a point at an intersection, but in this case we 
have a family of concentric circles and a family of rays 
through the common center of the circles (see Fig. 64). 
The equation of the circles is 


x 2 + y 2 = r 2 , 


(9.4-9) 


with parameter r. The family of rays is given by taking 
in each case the suitable half of the line 


(9.4-8) 

d 0 intersect at a unique 



Fig. 63. 



w®em 


SS&i 


y cos 0 = x sin 6, 


(9.4-10) 


Fig. 64. 


with 6 as parameter. A point is located by specifying that it is at the intersection 
of a circle r = r 0 and a ray 0 = 6 0 . It will be observed that in general a point is at 
the intersection of just one circle and just one ray. The exceptional point is the 

origin. This is precisely the point at which the Jacobian = ^ ca ^ e< ^ a 

singular point of the polar co-ordinate system. 

In §9.5 we shall study more general co-ordinate systems arising from the 
study of two one-parameter families of curves of a fairly arbitrary character. 
Co-ordinate systems of such a general type are called curvilinear co-ordinate 
systems. 
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9.5 / CURVILINEAR CO-ORDINATES 

A system of curvilinear co-ordinates can be defined if we have two one- 
parameter families of curves subject to suitable conditions. The conditions may 
not be satisfied over the whole plane, but only in certain regions. Among the 
conditions which are normally required is the condition that each point in a 
region under consideration should lie on one and only one curve of each of the 
two families, and that these curves should not be tangent at the point. Before 
discussing the general theory it will be helpful to consider an example. 

Example 1 . Suppose that the two families of curves are 

y 2 = -u\x-u 2 ) (9.5-1) 

y 2 = v 2 (x + v 2 ), (9.5-2) 

where the parameters are u , v. 

The curves of both families are parabolas with the x-axis as axis of 
symmetry. The parabolas (9.5-1) open to the left; the parabolas of the other 
family open to the right. The vertex of (9.5-1) is at (u 2 , 0), and its focus is a 
distance \u 2 from the vertex. The y-intercepts are (0, ±u 2 ). A number of the 
curves of both families are shown in Fig. 65. 

The parameters w, v may be used as co-ordinates; that is, a point can be 
located by giving the values of u and v corresponding to the two parabolas 
which pass through the point. Actually, a u -parabola and a v -parabola intersect 
twice, so that a pair of values (u, t>) does not determine a unique point. 
Conversely, a given point (x, y) does not determine u and v uniquely, because 
the equations of the parabolas are not affected by a change in the sign of u or v. 

Let us attempt to solve equations (9.5-1) and (9.5-2) for x and y. Subtracting 
(9.5-2) from (9.5-1), we find 

0 = — x(u 2 + t> 2 ) + m 4 — u 4 , 
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or 


x = 


u - v 
m 2 + v 2 


_ ,.2 


U — V 


provided u and v are not both zero. If this result is substituted in (9.5-2), we get 

y 2 = u 2 v 2 , or y = ± uv. 

There are thus two possible differentiable transformations, 

x = u 2 -v 2 , y = uv , (9.5-3) 

and x = u 2 - v 2 , y - - uv, (9.5-4) 

which may be used to determine a point (x, y) by giving values of u , v. Either 
transformation allows us to use u and v as curvilinear co-ordinates. Suppose, for 
example, that we use (9.5-3). There is some arbitrariness in the choice of signs 
for u and v, but the product uv must have the same sign as y. We may, for 
instance, use u ^0 all the time; then we must use v >0 when y >0 and v <0 
when y < 0. Figure 65 has been labeled in accord with this choice. Other choices 
are possible, however. From (9.5-3) we find 


Wx, y) 
d(u , v) 


2 (u 2 + v 2 ). 


This is zero only when u = v = 0, which by (9.5-3) is equivalent to x = y = 0. The 
origin is called a singular point of the mu - curvilinear co-ordinate system. It is not 
possible to use u and v as co-ordinates, throughout a region having the origin as 
an interior point, in such a way as to have a one-to-one correspondence between 
(x, y) and (m, u) with the transformation from (m, d) to (x, y) and the inverse 
transformation from (x, y) to (u, v) both continuous throughout the region. 


Now let us consider the general theory of curvilinear co-ordinates in the 
plane. Suppose we have given a continuously differentiable transformation 


x = /(u, u), y = g(u, v). (9.5-5) 

such that the Jacobian 

j= HLm1 

d(u, v) 

is not equal to zero for a certain pair of values Mo, v 0 . Denote the corresponding 
values of x and y by x 0 , y<>. The following discussion will relate to pairs (x, y) 
sufficiently near (x 0 , yo) and pairs (m, v) sufficiently near (m 0 , v 0 ) so that we can 
use the conclusions described in Theorem I, §9.1. In particular, there is a 
continuously differentiable inverse transformation 

u = F(x, y), V = G(x, y). (9.5-6) 

Let us denote the Jacobian of the inverse transformation by j. By (9.1-6) of 

Theorem I we know that j = j. 
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The one-to-one correspondence between ( x , y) and ( u , v ), as established by 
the transformation and its inverse, makes it possible for us to use (m, u) as 
co-ordinates for the point ( x , y), since the point determines its co-ordinates by 
(9.5-6), and the co-ordinates determine the point by (9.5-5). It is desirable, 
however, to have a geometric interpretation of the mu -co-ordinate system. A 
geometric interpretation can be given in terms of two families of curves which 
form a mesh of quadrilaterals in somewhat the same way that the lines 
x = constant, y = constant, form a rectangular mesh. If we regard u as constant 
and let v vary, equations (9.5-5) are parametric equations of a curve, v being the 
parameter. We call such a curve a u -curve. Similarly, a v -curve is defined by 
(9.5-5) with v fixed and u as a parameter. Alternatively, these curves may be 
thought of as defined by (9.5-6). With n constant, a u -curve is defined by 
F(x, y) = m. If we confine our attention to a small neighborhood of a point where 
J t* 0, there will be exactly one M-curve and one u-curve through each point of 
the neighborhood, and these two curves will not be tangent at the point. We shall 
now show that the nontangency is a consequence of the fact that JV 0. The 
slope of a M-curve may be found from (9.5-5), with u constant and v the 
parameter. The slope is 

dy_ = dg /d£ 
dx dv / dv' 

Likewise, the slope of a v -curve is 

dy = dg / df_ 
dx du / du' 


These slopes are not equal, for their equality would imply that 


dgdi = dgd£ 
dv du du dv ’ 


or J = 0, 


contrary to our assumption. 

We can define curvilinear co-ordinates in three dimensions also. The basis 
for a set of such co-ordinates m, v, w is a transformation 

x = f(u, v , w), y = g(u, v , w), z = h(u, v , w), (9.5-8) 

with nonvanishing Jacobian, and the inverse transformation 

m = F(x, y, z), v = G(x, y, z), w = H(x, y, z). (9.5-9) 

The equations (9.5-9) define three one-parameter families of surfaces, called 
m- surfaces, u-surfaces, and w-surfaces respectively. A n-surface and a u-surface 
intersect in a curve, which we shall call a nu-curve. Similarly, there are 
uw-curves and wn-curves. The fact that the Jacobian of the transformation 
(9.5-8) is not zero implies that the three curves, one of each type, intersecting at 
a point, are pair-wise nontangent there. 

The most familiar examples of curvilinear co-ordinates in three dimensions 
are cylindrical co-ordinates and spherical co-ordinates. A great variety of 
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three-dimensional systems can be generated by starting with a two-dimensional 
system and rotating the plane about a line in the plane. The two families of 
curves in the plane generate surfaces in space. Half-planes through the axis of 
rotation form a third set of surfaces. Spherical co-ordinates are derived from 
plane polar co-ordinates in this way. 

Example 2. Consider the transformation 

x - uv cos S, y = uv sin 6, z = u 2 -v 2 , (9.5-10) 

with Jacobian 


y ’^ = 2 uv(u 2 +v 2 ). (9.5-11) 

d(u,V,S) 

Let us assume u ^ 0, v ^ 0, and write r = (x 2 + y 2 ) 1/2 = uv. Then we see from 
(9.5-10) that 

x = r cos 0, y = r sin 6. 


The equations 


r = uv, z = u 2 -v 2 (9.5-12) 

can be regarded as defining a set of plane curvilinear co-ordinates (u,v) in the 
rz-plane. With rotation about the z-axis, using r , 0 as polar co-ordinates in the 
xy -plane, we obtain equations (9.5-10), which we can use to establish u , v, 6 as 
curvilinear co-ordinates in space. The u -curves and v -curves in the rz-plane are 
parabolas (compare with equations (9.5-3) and (9.5-1), (9.5-2)), a few of which 
are shown in Fig. 66. In the three-dimensional system, the u -surfaces and 
^-surfaces are paraboloids of revolution about the z-axis, and the 0-surfaces are 
half-planes with the z-axis as edge (see Fig. 67). The transformation has a 
continuously dilferentiable inverse in the neighborhood of any point not on the 
z-axis, for such points are obtained only when neither u nor v is zero, and the 
Jacobian (9.5-11) is then not equal to zero. All points of the z-axis are singular 
points of the uvO -coordinate system. 


z 




Fig. 66. 


Fig. 67. 
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EXERCISES 


1. (a) Sketch the u -curves and v -curves in the xy -plane if x = u cosh v, y = 
u sinh v. (b) Does the mi; - system of curvilinear co-ordinates have a singular 
point? (c) To what part of the xy -plane does the system not apply? 


2. Discuss the co-ordinates (m, u) related to rectangular co-ordinates (x, y) by the 
equations 2x - y + u = 0, x - 2y + 2v = 0. Draw and describe the u -curves and the v- 
curves. Find the transformation and the inverse transformation corresponding to (9.5-5) 
and (9.5-6) respectively. Verify Theorem III in this case. 


3. Show that u, v can be used as curvilinear co-ordinates when 0 < x < tt/2 and 
y > 0 if u = y/tan x, v = y/sin x. Solve for x and y in terms of u and v. Draw the curves 

M and |&2i 
a(x, y) d(u,v) 


1, v = 2 in the xy -plane. Compute 


and verify that they are 


reciprocals. 


4. Find the inverse of the transformations (9.5-3), assuming u and v positive. 


5. Suppose 


2u 


u 2 + v 2 ’ 


y = 


—2v 

u 2 +v 2 ' 


(a) Show that 


and hence that 


* 2 +y 2 = ^, 


2x 


_ -2y 


x 2 + y 2 ’ v ~ x 2 + y 2 ' 


(b) Show that the u -curves are circles through the origin with centers on the x-axis, and 
that the v -curves are circles tangent to the x-axis at the origin. Draw several curves of 
each family. Locate the points with the following curvilinear co-ordinates: u = 1, v = -2; 
u = 0, v = 1; v = 0, u :m - 2. 

6. Let x = cos u cosh u, v = sin u sinh v, where 0 ^ u ^ tt and v is 
arbitrary, (a) Describe the u -curves and the t> -curves by name, and show how variation 
in the sizes of u and v affects the curves. Draw a number of curves of each 
family, (b) What are the singular points of the uv -curvilinear co-ordinate 
system? (c) Describe the points for which u = 0; for which u = 7t/ 2; for which v = 0. 

7. Suppose that two families of curves in the xy -plane are defined by 


</>(x, y, w) = 0, i|/(x, y, v) = 0, 

where 4> and iff are continuously differentiable functions such that 

cK^j//) 

d(x,y) ’ dud t; 

for values of the variables under consideration. Show that these families of curves can be 
used to establish u and v as curvilinear co-ordinates. If the notations J, j are used as in 
(9.1-6), show that 

d((b, if/) . dd> dip 

8. (a) If x = r sin (b cos 0, y = r sin (p sin 8, z = r cos (p , find 

dyr, <p , u) 

(b) What is the geometric nature of an r-surface? a </> -surface? a 8 surface? 



9.6 


IDENTICAL VANISHING OF THE JACOBIAN. FUNCTIONAL DEPENDENCE 263 


(c) Solve for r, </>, 0 in terms of x, y, z, assuming r > 0, 0 < <£ < ir}2, 0 < 0 < ttI2 for 

convenience. Compute and verify that it is the reciprocal of the Jacobian found 

y, z) 


in (a). 


9. Discuss the three-dimensional system of curvilinear co-ordinates (u, v , 0), where 
x = r cos 0, y = r sin 0, 


cosh u + cos it 


sinh u 

cosh u 4- cos v 


and r 2 = x 2 +y 2 . Begin by discussing and sketching the w-curves and ^-curves in an 
rz-plane. The relevant equations to be obtained are 


r 2 + z 2 - 2r ctnh u + 1 = 0, 
r 2 + z 2 + 2 z ctn t> - 1 = 0. 


Then rotate around the z-axis. Find the singular points of the co-ordinate system. The 
co-ordinates ( u , v, 0) are known as toroidal , or ring , co-ordinates. Describe the u -surfaces 
and v -surfaces by name. 

10. What is the analogue of (9.1-6) for a transformation x = f(u ) where there is just 
one variable in each set? 


9.6 / IDENTICAL VANISHING OF 

THE JACOBIAN. FUNCTIONAL DEPENDENCE 

In this section we inquire into the state of affairs when the Jacobian determinant 
of a transformation is equal to zero throughout a region. First we consider 
briefly the significance of this phenomenon in the linear case. After that we 
move to nonlinear mappings from the plane to the plane, with reasoning which 
could be used to extend the theorems to higher dimensions. 

The two linear functions ax + by and cx + dy are said to be linearly dependent 
in case there are two numbers k i and k 2 , not both of which are zero, such that 

k](ax + by) + k 2 (cx + dy) = 0. 

Those who have had some linear algebra will recall that a necessary and 
sufficient condition for this is that 

a 5 =0 

c d 

This determinant is the Jacobian of the linear transformation 

u = ax + by, 
v = cx + dy. 

So the vanishing of the Jacobian of the transformation is equivalent to the 
existence of two numbers k\ and k 2 such that 


k\u + k 2 v = 0 for all (x, y). 


(9.6-1) 
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This means that the entire xy-plane is mapped into the straight line (9.6-1) 
through the origin in the m, d - plane. 

This should lead us to wonder if there is some analogous theorem in the case 
of nonlinear transformations. Let us begin with the transformation 

u = F(x, y), t> = G(x, y) (9.6-2) 

d(F G) 

and see what conclusion we can deduce from the hypothesis that — — ~ 

d(x,y) 

vanishes identically throughout some region R of the x , y -plane. Since our 
equations here are nonlinear we would not expect to find that (9.6-2) maps R 
into a straight line but rather, perhaps, into a curve. This is roughly what the 
following theorem says — at least locally. 


THEOREM III. Suppose that F and G are continuously differentiable functions 
of x and y in some open set R and that 

d j / 7 ’ ~ = 0 for all (x, y) in R. (9.6-3) 

y) 


Suppose further that (x 0 , yo) is a point of R at which either — ^ 0 or — ^ 0. 

dx dy 

Let Uq = F(x o, yo)- Then there exists an interval of the u-axis centered at u 0 
and a function <f>(u) defined thereon , such that 

G(x, y) = (f>[F(x, y)] (9.6-4) 

throughout a neighborhood of (x 0 , yo). 


Before taking up the proof, we comment on the theorem. First, like all our 
theorems on nonlinear functions, this is a “local” theorem — it gives a result 
holding in neighborhoods of certain points rather than throughout the entire 

dG 

domain of the transformation. Second, we could just as well have taken — or 

oX 

dG dFdF 

— to be different from zero instead of — or — . This would have led to the 
ay dx dy 

conclusion 

F(x , y) = i//[G(x, y)] (9.6-5) 

instead of (9.6-4). Geometrically speaking, (9.6-4) means that some neighbor- 
hood of (x 0 , yo) is mapped into a curve given by v = (f>(u), whereas (9.6-5) gives a 
curve of the form u = ip(v). So unless all four of the first partial derivatives, F i, 
F 2 , G b and G 2 vanish at (x 0 , y 0 ), some entire neighborhood of this point is 
mapped into a curve in the m, v -plane. We could investigate conditions under 
which these curve segments in the u , u-plane join up to form one big curve, but 
this would lead to difficulties which we prefer to avoid. 
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Proof of Theorem III. Consider the equation u = F(x, y). Suppose — ^ 0 at 

ox 

(x 0 , y 0 ); then the implicit-function theorem guarantees the existence of a solution 
x=f(u, y) giving all triples (x, y, u) near (x 0 , y 0 , Mo) for which u = F(x, y). 
Moreover, 


dF 
= _dy_ 
dy dF ' 
dx 


(9.6-6) 


All this applies when the differences x - x 0 , y — yo, u — m 0 are sufficiently small. 
Consider the function G(x, y) as a function of u and y, with x = f(u, y). The 
partial derivative with respect to y is 

d(F, G ) 

dG df dG _ d(x, y) _ Q 
dx dy dy dF 

dx 

because of (9.6-6) and (9.6-3). Thus G(/(m, y), y) is actually independent of y. 
Let us write 4>(u) = G(f(u , y), y). Since x = f(u, y) is equivalent to u = F(x, y) 
for the values of the variables here in question, 4>(u) = G(f(u, y), y) is equivalent 

3 F d F 

to (9.6-4). If we assume — ^ 0 instead of — ^ 0, similar reasoning again leads 

dy dx 

to (9.6-4). 

Example 2. The argument is illustrated by F(x, y) = x 2 y 2 , G(x, y) = -xy, 
x 0 = yo = 1, /(m, y) = Vu/y, 4 >(m) = -Vm. Observe that, with these same Junc- 
tions F, G, but with x 0 = - 1, y 0 = 1, we obtain f(u , y) = - V ul y, 4>(u) = Vm. 

Theorem III can be generalized to n functions of n variables. For n = 3 the 
hypotheses that 


a(F,G,H) 

d(x, y, z) 


(9.6-7) 


in a neighborhood of (x 0 , yo, z 0 ), and that at least one of the Jacobians 

d(F, G) g(F, G) a(F,G) 
a(x, y)’ d(y,z)’ a(z,x) 

is not zero at this point, leads to a conclusion of the form 

H(x, y, z) = 4>(F(x, y, z), G(x, y, z)), (9.6-8) 

where <j>(u , t>) is defined near u 0 = F(x 0 , yo, Zo), = G(x 0 , yo, Zo). 

Theorem III says that under certain conditions the identical vanishing of the 
Jacobian implies that neighborhoods (two-dimensional subsets) in the xy-plane 
are mapped into curves (one-dimensional subsets) in the uv -plane. Our next 
theorem will be somewhat like a converse of this. We shall assume that for all 
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(x, y) in R, 

d>[F(x, y), G(x, y)] = 0. (9.6-9) 

Notice that this includes as special cases both G(x, y) = cj>[F(x , y)] and F(x, y) = 
i p[G(x, y)]. When such a function as exists, F and G are said to be functionally 
dependent in R. The term is not completely defined until we specify what other 
properties <I> must have. We shall require that it be continuous. 

THEOREM IV. Suppose that 

u = F(x, y), v = G(x, y) (9.6-10) 

is a continuously differentiable mapping of a region R of the xy -plane into the 
uv-plane. Let R' be a region in the uv-plane which contains the image of R. 
Let <P(u, v ) be a function which is defined and continuous in R', such that 
there is no neighborhood of a point in R' throughout which 0(w, v) = 0. 
Finally, suppose that 

<&(F(x, y), G(x, y)) = 0 (9.6-11) 

whenever (x, y) is a point of R. Then the Jacobian ^ ^ vanishes at all 
points of R. 

Proof. Suppose the equation (9.6-3) fails to hold at some point (x 0 , y<0 in R . 
Then the mapping (9.6-10) is locally one-to-one, and maps a neighborhood of 
(x 0 , y 0 ) onto a neighborhood of the corresponding point (n 0 , i>o). In view of this, 
(9.6-11) means that <L(u, i?) = 0 throughout a neighborhood of (n 0 , u 0 ). Since this 
violates the hypothesis, (9.6-3) must hold at every point of R. 

Example 1 . Theorem IV is illustrated by 

F(x, y) = Vy sin x, G(x, y) = y cos 2 x - y, <Pyu,v) = u 2 + v, 

with R the half-plane y >0, R' the entire wu -plane. 

When a pair of functions F(x, y), G(x, y) are such that the conditions stated 
in the hypotheses of Theorem IV are satisfied, F and G are said to be 
functionally dependent in the region R. The functional dependence is expressed 
by (9.6-11). 

EXERCISES 

1. Are the functions x + y, x 2 + y 2 functionally dependent in the neighborhood of any 
point? 

2. Under what condition will the linear functions ax + by, cx + dy be functionally 
dependent? 

3. (a) Let F(x, y, z) = x + y - z, G(x, y, z) = x - y + z, H(x, y, z)-x 2 + y 2 + z 2 - 2yz. 
Verify that (9.6-7) holds in this case. Solve the equations u = F(x, y, z), v = G(x, y, z) for 
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x and y in terms of w, v , z, and substitute the solutions for x and y in H(x, y , z). Observe 
that the result is independent of z. From this work find the function (j)(u, v ) such that 
(9.6-8) holds. 

(b) Show that this line of reasoning is applicable in the general case, provided 


d(F, G ) 
d(x, y) 


* 0 . 


4. Show in each case that the functions are functionally dependent, and find the way 
in which the third function depends on the first two. Use the method of Exercise 3. 

(a) u - x + y + z, v = xy + yz + zx, w = x 2 -1- y 2 + z 2 . 

(b) u = x/( y -z),v = y l(z - x), w = z/(x - y). 

5. Without using the inverse function theorem, prove the variation on Theorem IV 
obtained by replacing the third sentence of that theorem by the following: Let <E>(w, v) be 
a diiferentiable function in R' for which <£?+<I>i>0 at each point of R'. Hint: All you 
need is the chain rule and the fact that a homogeneous linear system has a nontrivial 
solution if and only if the coefficient determinant is zero. 


MISCELLANEOUS EXERCISES 

1. (a) Find the inverse of the transformation 

X y Z , 2 2 , 2,2 

u = -j, v - ^ 2 , w = — , where r = x + y + z . 
r r r J 

(b) What are the u -surfaces? 

(c) Calculate w ~. 

B(x,y ,z) 

2. If t = r cos x = r sin <f> cos t//, y = r sin sin if/ cos 6, z = r sin sin ^ sin 0, show 
that t 2 + x 2 + y 2 + z 2 = r 2 and find 

a(r, (f>, if;, 0) 

This indicates how spherical co-ordinates may be introduced in four-dimensional 
space. 

3. The notations in this exercise are those of the discussion preceding Example 2 in 
§9.1. Show that the set E, which is the image of the open rectangle R under the inverse 
transformation, is connected. Hint: Let A denote that subset of E consisting of (m 0 , i> 0 ) 
and those points of E which can be reached from (w 0 , i>o) by a polygonal path made up of 
finitely many line segments and lying altogether in E. A is not empty. Let B be the part of 
E which is not in A. We want to show that B is empty. Show that A is open and B must 
be open. Then T(A) and T(B) are two disjoint open sets whose union is R. By §5.1, 
Example 5, they cannot both be nonempty. Therefore T(B ) is empty, so B is empty and 
E is connected. 



10 / VECTORS AND 
VECTOR FIELDS 


10 / PURPOSE OF THE CHAPTER 

This chapter has two purposes. One is to prepare the way for Chapter 11, in 
which we study linear transformations from one Euclidean vector space to 
another. For this purpose we need to explain the basic notions about vectors in 
Euclidean space and the algebra of Euclidean vector spaces. The second 
purpose is to concentrate on the notions of gradient, divergence, and curl in 
relation to the notions of scalar fields and vector fields. Much of this second 
purpose of the chapter is related to the use of vector-valued functions in the 
study of curves and surfaces in Chapter 14 and the study of the divergence 
theorem and Stokes’ theorem in Chapter 15. Nearly all of the important 
applications of these theorems arise in connection with the reduction of “applied 
problems” (such as problems in electrostatics, electricity and magnetism, elasti- 
city, gravitational theory, diffusion, and hydrodynamics) to mathematical prob- 
lems formulated in terms of partial differential equations. 

Much of the usefulness of vectors comes from the fact that we are able to 
avoid cumbersome and perhaps irrelevant references to a point by the 
specification of its co-ordinates in some co-ordinate system. Instead, we identify 
the point by a vector. The convenience in being able to use one symbol instead 
of several for a mathematical object is significant. But there is also another 
important consideration: that of achieving the clarity of thought and under- 
standing that comes from expressing things in a way that makes no reference to 
a co-ordinate system. 


10.1 / VECTORS IN EUCLIDEAN SPACE 

In §2.6 we talked about the “axis of reals” as a straight line representing the real 
number system, each point being thought of as a real number. We call this line 
the one-dimensional Euclidean space R, or R 1 . The origin of this space is the 
point representing number zero. 

The two-dimensional Euclidean space R 2 is the plane of plane analytic 
geometry, with points (x, y) identified by use of a Cartesian co-ordinate system 
consisting of an x-axis and a y-axis intersecting at right angles. Sometimes we 
denote points by (xj, x 2 ) instead of (x, y). The origin in R 2 is the point (0, 0). 

The Euclidean space R 3 is the three-dimensional space of points (x, y, z), 
where the three co-ordinate axes are mutually perpendicular. The origin in R 3 is 
the point (0, 0, 0). 
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There is, of course, a point of view about Euclidean geometry (of three 
dimensions, let us say, to be specific), in which the geometric theory is con- 
structed without an origin and a co-ordinate system, the fundamental notions 
and theorems being developed from assumptions about points, lines, planes, and 
the use of distance and the concepts of parallelism and perpendicularity. In this 
aspect of Euclidean geometry there is no particular point to be given special 
recognition as “the origin” and there are no particular lines to be given special 
status as co-ordinate axes. We shall use this “co-ordinate-free” point of view 
from time to time and we shall take advantage of our common familiarity with 
what we may call the “physical reality” of three-dimensional space, the space in 
which we live and about which we have some useful intuitive perceptions based 
on experience. But we shall build our systematic treatment of vectors in 
Euclidean space on the foundation provided by R 3 as a set of things called either 
points or vectors. The same thing can be done for R 2 as the basic model for 
Euclidean plane geometry (when treated analytically). And then it is readily seen 
how to make the generalization to R n , where n can be any positive integer. 

Let us denote elements of R 3 by A = (Aj, A 2 , A 3 ), B = (B u B 2 , B 3 ), and so on. 
We shall call them vectors , using boldface type for the symbols. We call A u A 2 , 
A 3 the components of A. They are simply the co-ordinates of A if we think of it 
as a point (which we are free to do, of course). The reason for using the word 
vector is that we are going to introduce definitions of algebraic operations on the 
elements of R 3 which make it into what is called a vector space in the technical 
terminology of linear algebra. It is a common usage to represent A visually as an 
arrow from (0, 0, 0) to (A 1? A 2 , A 3 ). If P is the tip of the arrow (see Fig. 68), then 
P is A when we think of A as a point. 

There is an algebra of vectors in R 3 which rests on addition and subtraction 
of vectors and multiplication of vectors by real numbers (which are often called 
scalars ). The vector (0, 0, 0) is called the zero vector, and denoted by 0. 

The vector sum of A and B is defined by 

A + B = (A u A 2 , A 3 ) + (Bu B 2 , B 3 ) = (A { + B u A 2 + B 2 , A 3 + B 3 ). (10.1-1) 


z 



Fig. 68. 
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The vector difference is defined by 


A — B — (Ai — Bi, A 2 — B 2 , A3 — B 3). 

Multiplying A by a scalar c is defined by 

(10.1-2) 

cA = c(A,, A 2 , A 3 ) = (cA,, 

cA 2 , CA3). 

(10.1-3) 

It is easily verified that the following algebraic rules are valid: 


A + B = B + A (commutative law) 

(10.1-4) 

(A + B) + C = A + (B + C) (associative law) 

(10.1-5) 

A + 0 = A 


(10.1-6) 

c(A + B) = cA + cB | 

(distributive laws) 

(10.1-7) 

(a + b) A = a A + bA J 


(10.1-8) 

a(bA) = (ab) A 


(10.1-9) 

1A = A 


(10.1-10) 

0 

> 

II 

p 


(10.1-11) 


Note that the zero on the left in (10.1-11) is a number, while the zero on the right 
is a vector. Observe also that A - B is the same as A + ( — 1)B, that A - A = 0, and 
that the equation X + B = A can be solved uniquely for X, the solution being 
X = A-B. We regularly write -A for (-l)A. Also, we sometimes write A/c for 

SK 

These algebraic rules are so fully consistent with the ordinary rules in the 
algebra of real numbers that we find it easy to become proficient in vector 
algebra. Note, however, that we have not, in the foregoing, defined how to form 
the product of two vectors as another vector. Instead, we define what is called 
the scalar product of two vectors. The usual notation for it employs a dot 
between the factors, and for that reason A • B is called the dot product of A and 
B. It is also sometimes called the inner product. The dot product is a scalar . The 
definition is made in terms of the components: 

A • B = A,B, + A 2 B 2 + A 3 B 3 - (10.1-12) 

It is easily verified that the following rules are valid: 

A*B = B*A (commutative law) (10.1-13) 

A • (B + C) = A • B + A • C (distributive law) (10.1-14) 

(cA) • B = c(A * B) (10.1-15) 

It follows from the choice c = 0 in (10.1-15) that 0 * B = 0. However it can 

happen that A • B = 0 even though neither A nor B is the zero vector. For 

example, (1, 1, 1) • (1, -2, 1) = 0. 
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We see from (10.1-12) that 

(A • A) 1/2 = (At + A\+ Al) m 

is the length of the vector A, that is, the distance from the origin to the tip of A 
(or to A, thought of as a point). We shall denote the length of A by ||A|| and call it 
the norm of A. Thus 

||A|| = (A • A) 1 ' 2 . (10.1-16) 

The only vector with zero length is 0. We say that each nonzero vector 
determines (or has ) a direction in R 3 . If A # 0 and B + 0, we say that A and B 
have the same direction if one is a positive multiple of the other B = cA, where 
c > 0. When B is a negative multiple of A we say that A and B have opposite 
directions. 

If we subject every vector in R 3 to a transformation by adding to each vector 
the same nonzero vector C, so that every A is transformed into A + C, we call 
this a translation of the space. Every point is carried into a new point which is a 
distance ||C|| in the direction of C from the original point. 

There is a convenient way to visualize addition and subtraction of vectors. 
Let A and B be two nonzero vectors, and picture them as arrows emanating from 
the origin. If B is not a multiple of A the two vectors determine a unique plane. 
The sum C = A + B is then the vector emanating from the origin which forms the 
diagonal of the parallelogram of which A and B are adjacent sides. Another way 
of describing the situation intuitively is as follows: Displace B by a translation 
which brings its initial point (the origin) into coincidence with the terminal point 
(tip) of A. The sum A + B is then the vector from the initial point of A to the 
terminal point of the displaced vector B (see Fig. 69). This mode of representing 



vector addition gives rise to the name “the parallelogram 
law of addition.” If B has the same or opposite direction 
as A, the addition of B to A can be visualized by the same 
process of displacement of B: A+B extends from the 
initial point of A to the tip of the displaced vector B. 

Vector subtraction can also be displayed visually by 
use of the parallelogram law. See Fig. 70 and recall that 
A — B is that vector which, when added to B, gives A. 
Multiplication by scalars can likewise be displayed visu- 
ally. See Fig. 71. 



Fig. 70. 
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Fig. 71. 


The dot product A • B can also be given a geometric interpretation that can 
be displayed visually. To explain this we use the law of cosines and properties of 
the dot product. We know from (10.1-16) that 

||A-B|| 2 =(A-B)-(A-B). 

But, by the algebraic rules, 

(A — B)(A — B) = (A — B)*A — (A — B)B = A*A-B A — AB + BB 
= l|A|| 2 — 2A • B + ||B|| 2 . 

Therefore 

||A - B|| 2 = ||A|| 2 + ||B|| 2 - 2A • B. (10. 1-17) 

Now consider Fig. 72, in which 0 is the angle 
between the vectors A and B. (Here we assume 
that neither A nor B is 0 .) The triangle with 
vertices at 0 and the tips of A and B has the side 
opposite 0 as a line segment whose length is the 
same as that of A-B. By the law of cosines 

l|A ~ B || 2 = ||A|| 2 + |B|p — 2||A|| ||B|| cos 6 

(10.1-18) 

On comparing (10.1-18) with (10.1-17) we see that 

A • B = ||A|| ||B|| cos 0. (10.1-19) 

This can be expressed as follows: The dot product A ■ B is equal to the product 
of the length of B times the signed projection of A on the directed line of the vector 
B. From (10.1-19) it is clear that, when neither A nor B is 0 , A • B = 0 if and only if 
cos 0 = 0, that is, if and only if A is perpendicular to B. 

Sometimes nonzero vectors in Euclidean space of three dimensions are 
defined purely geometrically as directed line segments. Thus, if P and Q are 
distinct points, the ordered pair of points P,Q determines a directed line 
segment from P to Q and is called a vector, denoted by PQ. If R y S is another 



Fig. 72. 
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ordered pair of points such that P$ and RS have the same length and direction, 
PQ and RS are said to be equal (or, sometimes, equivalent) vectors. This means 
that RS can be brought into coincidence with PQ by a translation of the whole 
space whereby R is moved along a straight line to P, and S is likewise moved, in 
the same direction, to Q. In physics a vector is often called a free vector if it is 
considered to be the same after any translation of the space. In our presentation 
of vectors in R 3 , however, we think of all vectors as having their initial point at 
the origin. With this mode of thinking about vectors we call R 3 a Euclidean 
vector space. As mentioned previously, we find it convenient to refer to a vector 
A as either a vector or a point, so that it is needless to have a separate notation 
for the point that is the tip of A. 


10.11 / ORTHOGONAL UNIT VECTORS IN R 3 

Let a fixed rectangular co-ordinate system be chosen with origin O. It is 
conventional, particularly in dealing with physical applications, to work with 
right-handed co-ordinate systems, and we shall ordinarily adhere to this con- 
vention. Now let i, j, k be vectors, each of unit length, in the directions of the 
positive x, y, and z axes, respectively (see Fig. 73). It is clear that if A is any 
vector, we can express it in the form 

A = Aji + A 2 j + A 3 k, (10.11-1) 

where A u A 2 , A 3 are the components of A (see Fig. 74). We call i, j, k the 



Fig. 73. 



fundamental orthonormal triad associated with this particular co-ordinate sys- 
tem. The word “orthonormal” is a combination of “orthogonal” and “normal.” 
The vectors i, j, k form an orthogonal set ; that is, they are mutually per- 
pendicular. A vector is said to be normalized, or normal, if it is of unit length. The 
orthonormal character of the triad i, j, k is expressed by the relations 


i*i = j*j = k*k = 1, 
i *j = j *k = k • i = 0. 


(10.11-2) 
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EXERCISES 

1. Find A + B, A - B, and A • B in each case. Are A and B collinear, perpendicular, or 
neither? 

(a) A = (1, 1, -1), B = (3, —2, -1). 

(b) A = (1,4, 3), B = (4,2, -4). 

(c) A = (2, -1,1), B = (3,-4, -4). 

(d) A = (6, 4, -2), B = (-9,-6, 3). 

2. Find ||A|| in each case. 

(a) A = 2i + 3 j + 6k. (b) A = 4i + 2j-4k. (c) A = 2i + j-2k. (d) A = 4i + 3j. 

3. If the angle between A and B is 0, where A = (-2, -2, 1) and B = (1, -2, 2), find 
cos 0. 

4. Let A, B, C, D be four vectors in the vector space with origin O, and suppose that 
B - A = C - D. Show that the tips of these four vectors are consecutive vertices of a 
parallelogram. 

5. Let A, B, C be noncoplanar vectors in the vector space with origin O. What is the 
vector A+ B + C in relation to the parallepiped of which A, B, C are edges? 

6. Let A = (cos 0, sin 8, 0), B = (cos <f>, sin <j>, 0), where 0 ^<j>^8^ 2 tt. Draw a figure 
showing the positions of A and B in the xy-plane. Use (10.1-12) and (10.1-19) to obtain the 
trigonometric identity for cos(0 - <j>). 

7. If A and B are of unit length, and 8 is the angle between them, express ||A - B|| as a 
function of 0. 

8. Let A be a vector of unit length, and let B be any vector. Let C = (B • A)A, 
D = B - C. Prove that D • A = 0. Draw a figure showing the relation of A, B, C, D. Work 
out the special cases (a) A = i, B = 5i + 2j; (b) A = j(3i + 4j), B = 5j. 

10.12 /THE VECTOR SPACE R" 

By a natural generalization of our definition of R 1 , R 2 , and R 3 we define R", for 
any positive integer n, as the set of all ordered n-tuples (x b . . ., x„). of n real 
numbers. We call the elements of R" either points or vectors. Here we shall 
usually call them vectors, because we are going to be extending the notions that 
were introduced in §10.1 to enable us to regard R n as a Euclidean vector space 
of n dimensions. When n > 3 we are in general deprived of our useful resort to 
physical intuition and the visualization of vector relationships in “ordinary” 
space of one, two, or three dimensions. However, we continue to use geometric 
language, even though we deal with matters analytically. 

We use the notations x = (xj, . . ., x n ), y = (y i, . . ., y*,), and so on, with a vector 
designated by a single letter in boldface type and the components of the vector 
designated by the same letter, with subscripts, but not in boldface type. We can 
use either capital or lower case letters for vectors. 

We define x + y and cx by the natural generalizations of (10.1-1) and 
(10.1-3), adding vectors by adding corresponding components, and forming cx 
by multiplying each component of x by c. We also extend the definition of the 
dot product, defining 


x • y = Xiyi + • • • + x n y n . 


(10.12-1) 
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The length of x, called its norm, is denoted by ||x|| and defined by 

||x|| = (^+**- + x 2 n ) 1/2 . (10.12-2) 

From (10.12-1) we see that 

J|x|| 2 — x • x. (10.12-3) 

With (0, 0, ...,0) as the vector 0, the same algebraic laws hold for R n as 
those given for R 3 in (10.1-4) to (10.1-11) inclusive and (10.1-13) to (10.1-15) 
inclusive. 

The following inequality, known as Cauchy’s inequality, is very important: 

n / n \ 1/2 / n \ 1/2 

= (2 *;) (2 yy ■ (10.12-4) 

One way of obtaining Cauchy’s inequality was indicated in Exercise 29, §6.8. For 
another way see Exercise 4 at the end of this section. We can rewrite (10.12-4) as 

I* • y I s INI Ibi (10.12-5) 

From this it is easy to show that 

||x + y||S||x|| + ||y||. (10.12-6) 

We leave the derivation of (10.12-6) to the student; see Exercise 5. This 
inequality is called the triangle inequality. It corresponds to the geometric 
assertion that in a triangle the length of one side is never larger than the sum of 
the lengths of the other two sides. 

We define distance in R" by the natural generalization of the formula in R 3 . 
The distance d(x, y) between x and y (thought of as points) is defined as 

r n *il/2 

d(x, y) = (x, - y,) 2 J = ||x - y||. (10. 12-7) 

Two vectors x, y in R" are said to be orthogonal if x • y = 0. The naturalness 
of this definition is seen from the discussion accompanying (10.1-19). 

There is a set of n mutually orthogonal vectors of unit length in R n entirely 
analogous to the vectors i, j, k in R 3 that were introduced in §10.11. We define 

ei = (1,0, 0, . . 0) 

e 2 = (0, 1, 0, . . ., 0) (10.12-8) 

e„ = (0, 0, . . .,0, 1). 

It is clear that ||e,|| = 1 and e ; * e,- = 0 if iV j, so we call e u . . . , e„ an orthonormal 
set. We see at once that we can write 

x = Xiei + • • • + x„e„, (10.12-9) 

so that each vector x is a linear combination of the e,’s, the coefficients being the 
components of x. This representation of x is the analogue of the representation 
of A in (10.11-1). 
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Our reason for saying that R n is n -dimensional is closely related to (10.12- 
9). To explain the concept of dimensionality we must first speak about the 
notions of linear dependence and linear independence of a set of vectors. A set 
of k vectors uj, . . .,u k is called linearly dependent if there is a set of k scalars 
c c k not all zero, such that 


c,ui + • • ■ +C k u k = 0. (10.12-10) 

If Cj / 0, we can solve the foregoing equation for u„ expressing it as a linear 
combination of the other u,’s. On the other hand, if the set m, . . Uk is not 
linearly dependent, we call it linearly independent. In this case, an equation of 
the form (10.12-10) cannot be valid except when all the cC s are zero. 

The vectors ei, . . ., e n are linearly independent, for 

Cjej T * * ■ T c n e„ (Cj, . . ., c n ) 0 

if and only if Cj = • • * = c n = 0. The vectors ej, . . ., e„ form what is called a basis 
for R n ; this means that the vectors are linearly independent and that every 
vector in R" is a linear combination of the ej’s. There are other sets forming a 
basis for R" ; we call ei, . . . , e n the standard basis for R". In particular, i, j, k is 
the standard basis for R 3 . It is a special case of a general theorem of linear 
algebra that every basis for R n consists of exactly n vectors. It is for this reason 
that we say that R” is n -dimensional. 

It is easy to see that a set of k vectors is linearly dependent if one of them is 
zero, and that an orthonormal set of k vectors is linearly independent (Exercises 
1 and 2). As a general test for linear dependence we have the following theorem. 


THEOREM I. A necessary and sufficient condition that the vectors vj, . . ., v k be 
linearly dependent is that the determinant G defined by 


Vi ' 

• Vi 

Vi ' 

■ V 2 . . 

, . Vi • 

• Vk 

v 2 * 

1 Vi 

v 2 ■ 

‘ V 2 . . 

• V 2 ■ 

• Vk 

Vk 

• Vi 

Vk 

• V 2 . . 

. Vk 

• Vk 


( 10 . 12 - 11 ) 


be equal to zero. (An equivalent statement is that the vectors \ u . . ., \ k are 
linearly independent if and only if G/ 0.) 


Proof. We have to show that G = 0 if and only if there exist scalars c u ...,c k , 
not all zero, such that 


CiVi + • • • + CfcVfc = 0. (10.12-12) 

If we suppose that the equation (10.12-12) holds for some set of c' s, then, by 
forming dot products of the expression in (10.12-12) successively with 
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Vi, V 2 , . . Vk, we obtain the system of equations 

c 1 V 1 • v i + c 2 \i * v 2 + • • • + c k vi • v k = 0 

C 1 V 2 • Vi + c 2 v 2 • v 2 + • • • + c k \ 2 * v k = 0 (10.12-13) 

CiVfc * Vj + c 2 v k • v 2 + • • • + C k Vk • Vk = 0. 


On the other hand, if equations (10.12-13) are satisfied by a set of c-' s, then so are 
the equations that result when we multiply both sides of the first equation by c u 
the second by c 2 , and so on. But, by use of the algebraic rules governing dot 
products, the equations thus obtained can be written 

CiVi - (Cl v, + - • * + CfcVfc) = 0 

C 2 v 2 * (c iVi + • • • + c k v k ) = 0 


CfcVfc * (CjVi + • • * + CfcVfc) = 0. 

On adding these equations, we conclude that 

(CiVi + * ‘ * + CfcVfc) • (CiVi + • • • + CfcVfc) = 0, 

and hence that equation (10.12-12) is valid, for a vector is zero if and only its dot 
product with itself is zero (see (10.12-2) and (10.12-3)). 

We see, therefore, that an equation of the form (10.12-12) holds true if and 
only if the system of homogeneous linear equations (10.12-13) (with the Cj’s 
regarded as unknowns) is satisfied. But, by the algebraic theory of simultaneous 
linear equations, the homogeneous system (10.12-13) is satisfied by a set of c,’s 
that are not all zero if and only if the determinant of the system (which is the 
determinant G in (10.12-11) is equal to zero. This proves the theorem. 

The determinant G is called the Gram determinant, or the Gramian, in honor 
of J. P. Gram, a mathematician of the nineteenth century. We shall presently see 
his name again. 

The members of a linearly independent set of vectors need not be unit 
vectors, of course, and they need not be mutually orthogonal, but if we are given 
a linearly independent set of k vectors Aj, . . ., A*, it is possible by a systematic 
process to construct an orthonormal set of k vectors vj, v 2 , . . ., Vk, each of which 
is a linear combination of the Aj’s. (Here we are assuming k ^ 2.) This process is 
called the Gram-Schmidt process , in recognition 
of the work of J. P. Gram and E. Schmidt, a 
mathematician of the early twentieth century. 

The idea of the process is quite simple; the 
basic method can be interpreted geometrically by 
dealing with two linearly independent vectors that 
are not orthogonal. We denote them by C and D 
and picture them in a plane. See Fig. 75. Fig. 75. 
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Let E be defined by 


E = 


DC 

l|C || 2 


c. 


(10.12-14) 


We call E the vector projection of D on C. Observe that |^j is 
the direction of C and that E is a multiple of this unit vector by 


a unit vector in 
the factor 


DC 

lien ' 

According to (10.1-19) this factor is ||D|| times the cosine of the angle between C 
and D; this justifies calling E the vector projection of D on C. 

Now consider the vector D-E (i.e., D minus the vector projection of D on 
C). It is orthogonal to C. This is clear in Fig. 75, but we can verify it analytically, 
for, by (10.12-14), 

(D-E)-C = D-C-^jj?C-C = 0, 

because ||C|| 2 = C • C. 

The Gram-Schmidt process utilizes repeatedly the process of subtracting 
from a given vector each of its projections on a certain set of vectors. If 
A b . . . , A fc is the given linearly independent set of vectors, we start by defining the 
unit vector 



and let B 2 be A 2 minus the projection of Ai on vi: 


b 2 = a 2 - (A 2 • vOvr, 

(recall that ||vi|| = 1). Now B 2 is orthogonal to vi, just as, above, D-E is 
orthogonal to C. Also, B 2 is a linear combination of Ai and A 2 , and therefore is 
not 0 (because Ai and A 2 are linearly independent). Hence we can define the unit 
vector 


B 2 

V 2 _ I|b 2 |P 

it is orthogonal to v,. Next, we define 


B 3 = A 3 - (A 3 • v i)v i - (A 3 * v 2 )v 2 ; 

here we have subtracted from A 3 its projection on vi and its projection on v 2 . A 
calculation shows that B 3 is orthogonal to vi and to v 2 . Also, it is not zero (why?), 
and so, if we define 
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we see that vi, V 2 , v 3 form an orthonormal set. Moreover, vj is a multiple of Aj, v 2 
is a linear combination of Ai and A 2 , and v 3 is a linear combination of Ai, A 2 , and 
A 3 . It is now clear how to continue the process of obtaining more v,-’s as long as 
there are still A,’s from which to subtract their projections on vi, . . . , v,-!. 


Example. Show that the following four vectors in R 4 are linearly in- 
dependent. Then apply the Gram-Schmidt process to construct an orthonormal 
set from the given vectors. The vectors are: 

A, = (1,1, 1,1) A 2 = (0,-1, 0,-1), MO 17-151 

A 3 = (0, 0, 1 , 1), A 4 = (1,-2, -2,0). U 

We illustrate the application of Theorem I. First we calculate 


A, -A, =4, 

Ai • A 2 — —2, 

Ai • A 3 — 2, 

Ai • A 4 — —3, 

A 2 • A 2 = 2, 

A 2 • A 3 = —1 

A 2 * A 4 = 2, 


A 3 • A 3 = 2, 
A 4 • A 4 = 9. 

<N 

1 

II 

< 

< 




The Gramian is 

2 2-3 

2-1 2 
1 2 -2 ' 

2-2 9 

By standard methods for calculating the value of a determinant we find that 
G = 25. Because G^Owe conclude that the four vectors are linear independent. 
Therefore we can proceed with the Gram-Schmidt process. We list the results 
by stages, leaving the detailed calculations to be verified by the student. 



-3 


v, = (UU), 

V 2 = B 2 , 
v 3 = B 3 , 

B 4 = (4, — 4 , 4 , I), 


B 2 A 2 "f V] ( 2 , — 2 , 2, — 2 ), 
B 3 = A 3 -Vi = (— 2 , 2 , 2 ) 

B4 = A 4 + 2V1 - 2 V 2 + |v 3 , 

I|b 4 || = 1, v 4 =ci-i-i,i). 


EXERCISES 

1. Show directly by the definition that Ai,...,A k is a linearly dependent set of 
vectors if at least one of the vectors is zero. 

2. Show directly by the definition that a set of nonzero and mutually orthogonal 
vectors is linearly independent. 

3. Show that the vectors Ai = 2i, A 2 =3i + 4J, A 3 = i + 2j + 3k are linearly in- 
dependent, and apply the Gram-Schmidt process to them. 

4. (a) Deduce the inequality (10.12-4) in the form (10.12-5) with the aid of the 
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following suggestions. Consider 

2 (*i + tyi f 

i = l 

and express it as a quadratic polynomial in the real variable t. Since the polynomial can 
never be negative, it cannot have two distinct real roots (why not?). What does this imply 
about the expression under the radical sign in the quadratic formula for the roots of the 
polynomial? 

(b) Show from the foregoing considerations that the equality sign will hold in (10.12-5) if 
and only if x is a multiple of y. 

5. Deduce the inequality ||x + y|| ^ ||x|| + ||y|| with the aid of Cauchy’s inequality in the 
form |x • y| ^ ||x|| ||y||. Under what condition (and only that) is it possible to have 

ll* + y|MMM|y||? 

6. (a) Show that the vectors A — (1, 0, 1), B = (1, 1, — 1), and C = (— 1, 2, 1) are mutu- 
ally orthogonal. 

(b) If V = (5, -2, 3), express V as a linear combination of A, B, and C. Hint: Try to find a, 
/3, y so that V = a:A+j3B+yC and show that a can be found as a multiple of V • A. 

7. Instead of using Theorem I to prove that the vectors in (10.12-15) are linearly 
independent, one can also use the following general test: A set of vectors Ai, A 2 , A 3 , A 4 in 
R 4 is linearly independent if and only if 


A, 

• ei 

A 2 • ei 

A 3 • e t 

A 4 • ei 

Ai 

• e 2 

A 2 • e 2 

A 3 • e 2 

A 4 • e 2 

A, 

• e 3 

A 2 • e 3 

A 3 • e 3 

A 4 • e 3 

A, 

* e 4 

A 2 * e 4 

A 3 * e 4 

A 4 • e 4 


where ei, e 2 , e 3 , e 4 are the standard basis vectors in R 4 . Prove that this test is valid. Then 
apply it to the vectors in (10.12-15). 

8. Show that the dot product defined in (10.12-1) has the following five properties. 

(a) x • y = y • x for all x and y. 

(b) x*(y + z) = x*y + x- z for all x, y, and z. 

(c) (cx) • y = c(x • y) for all x, y and all scalars c. 

(d) x • x >0 for all x. 

(e) x • x = 0 if and only x = 0. 


10.2 / CROSS PRODUCTS IN R 3 

In R 3 , there is another kind of vector multiplication, in which the product of two 
vectors is another vector. We shall define this product presently. It is called the 
cross product (or, sometimes, the vector product ), and is denoted by A x B. 

Definition. The cross product of a vector A by the vector B is defined as follows. 
If either A or B is 0 we define the product to be 0: 

0xB = Ax0 = 0. 

Otherwise , let 6 be the angle between A and B. We define A x B to be the vector 
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of magnitude 

l|A x B|| = ||A|| ||B|| sin 6 

whose line is perpendicular to the plane of A and B, and 
whose direction is such that A, B, and AxB form a 
right-handed system (see Fig. 76). If the vectors A, B lie along 
the same line, they do not determine a plane. In this case 
sin 0 = 0, however and so A x B = 0. Note that the magnitude ^ 
of AxB is equal to the area of the parallelogram of which A, 

B are adjacent sides. 

The motivation for this definition, and its usefulness, will be better under- 
stood by the student after he has seen the occurrence of the cross product in 
physical applications and in later mathematical developments. It has only very 
limited analogiesjwith the ordinary product of two numbers. Moreover, the cross 
product is something peculiar to vectors in three dimensions, having no analogue 
for vector spaces of dimension other than three. 

The principal algebraic rules governing the cross product are 

A x B = — (B x A), (10.2-1) 

(cA)xB = c(AxB), (10.2-2) 

A x (B + C) = (A x B) + (A x C). (10.2-3) 

Multiplication is not commutative, but anticommutative, as we see by (10.2-1). 
This law is apparent ^rom the definition of the cross product, since B, A and 
-(A x B) form a right-handed system. The law (10.2-2) isialso apparent from the 
definition of the cross product. The rule 

A x (cB) = c(A x B) (10.2-4) 

can be deduced from (10.2-1) and (10.2-2). 

Now consider the proof of the distributive 
law (10.2-3). The law obviously holds if A = 0, so 
we consider the proof on the assumption that 
A V 0. If B is any vector, let B' denote the vector 
projection of B on the plane perpendicular to A 
through the origin (see Fig. 77). Clearly ||B'|| = 

||B|| sin 0,.and therefore A x B = A x B'. Now, pro- 
jecting in this manner, we see that B' + C' is the 
projection of B + C. Therefore, instead of proving 
(10.2-3), it is sufficient to prove 

Ax(B' + C') = AxB' + Ax C\ (10.2-5) 

The advantage here is that the vectors B', C' and B' + C' either are 0 or are 
perpendicular to A. 
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Now let us consider a figure (Fig. 78) in 
which A is perpendicular to the plane of the page 
and is directed out toward the student as he 
reads the page. We may assume that neither B' 
nor C' is 0, and also that these vectors are 
not collinear, since in these cases (10.2-5) is 
certainly true, as we readily see (if B' and C' 
are collinear, C' is a multiple of B', and (10.2-5) 
then follows from (10.2-4)). Now, if V is 
any vector in the plane of the page, A x V will 
also lie in the plane of the page. It will make a 
90° angle with V, and will have a length ||A|| 
times that of V. Let us take V to be, successively, B', C', B' + C'. Then we see 
that the configuration of the vectors A x B', A x C', A x (B' + C') is similar to the 
configuration of B', C', B' + C', but is ||A|| times as large and is turned through an 
angle of 90° in the plane of the page (see Fig. 78). Necessarily, then, Ax(B' + C') 
will be the diagonal of the parallelogram with adjacent sides AxB', AxC', 
because B' + C' is the diagonal of the parallelogram with adjacent sides B', C'. 
Therefore (10.2-3) is true. 

A second distributive law 

(A + B)xC = AxC + BxC 

follows at once from (10.2-3) and (10.2--1). 

Using the distributive law, we may obtain expressions for the components of 
A x B in terms of the components of the vectors A and~B. We have 

A x B = (Aji + A 2 j + Ajk) x (Bii + B 2 j + fLk). 

We expand this by repeated use of the distributive law, and use (10.2-2) and 
(10.2-3). Nine terms are obtained in this way. To simplify the result we observe 
the following multiplication table, which is easily worked out directly from the 
definitions. 



Fig. 78. 


i x i = 0 
j x i = -k 
k x i = j 


i x j = k i x k = — j 

J x j = 0 j x k = i 

k x j = -i k x k = 0 


As a consequence, we find 


A x B = (A 2 Bj — A 3 B 2 )i + (A 3 B 1 — AjB 3 )j + (AiB 2 — A 2 B i)k. (10.2—6) 


It is convenient to remember this formula by noting that if we write C = A x B, 
then 


C\ = 


a 2 

b 2 




a 2 
b 2 • 


These two-row determinants are the cofactors of the elements of the first row of 
a three-row determinant having Ai, A 2 , A3, and B 1, B 2 , B 3 as its second and third 
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rows, respectively. Accordingly, as a memory device, we sometimes write 

i j k 

AxB= A, A 2 A 3 . (10.2-7) 

I B 1 1?2 B 3 

EXERCISES 

1. Find the indicated cross products. 

(a) (i + j + k)x(i + 2j + 3k); 

(b) (2i-3j-k)x(2i-5j + 3k); 

(c) (i — 2j — k) x (i — 3 j + 4k). 

2. Find the area of the parallelogram of which the vectors A = i - j + 2k and B = 
2i + 4j-k are adjacent sides. 

3. (a) Let A, B, C be noncoplanar vectors from O with terminal points P, Q, R 

respectively. Explain why 3 (B-A)x(C-A) is a vector perpendicular to the plane of 
PQR, and of length equal to the area of the triangle PQR . (b) Find the area of the 

triangle formed by the points ( 1 , 1 , -2), ( 2 , -1, 1), (1, 3, -1). 

4. Find A • (B x C) and B • (A x C) if 

(a) A = 2i — 3j + 5k, B = -i + 4j + 2k, C = 2i+3j; 

(b) A = 2i + 3j + k, B = i + 2j + 5k, C = — 2i + 4j + 3k. 

5. (a) If A = (Ai, A 2 , A 3 ), etc., show that 

A\ A 2 A 3 

A • (B x C) = B x B 2 . 

c, c 2 c 3 

(b) How does it follow from (a) that A • (B x C) = (A x B) • C? 

(c) What is the value of A • (B x C) if any two of the vectors are equal? 

(d) Explain why the numerical value of the determinant in (a) is equal to the volume of 
the parallelepiped having the vectors A, B, C as concurrent edges. 

(e) If A, B, C are permuted in all possible ways in the product A • (B x C), how many dilferent 
values can be obtained? 

(f) Find the values of D • (B - A) and D • (C - A), where 

D = AxB + BxC + CxA. 

6 . Let the pairs. A, B and C, D each determine a plane. Write an equation involving 
dot and cross products expressing the condition and these two planes be perpendicular. 

10.3 / RIGID MOTIONS OF THE AXES 

By a rigid motion of the axes we mean a shift from a rectangular co-ordinate 
system xyz with origin O to another rectangular co-ordinate system x'y V with 
origin O ' , both systems having the same unit of distance, and both systems 
having the same orientation (i.e., both being right-handed or both left-handed). 
Such a shift can be accomplished in two stages: by a translation to a new system 
with origin O' and axes parallel to the original axes, followed by a rotation about 
O'. The equations for a translation of the co-ordinate system are very simple, 
and need not concern us right now, since we regard the vector space with origin 
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O' as not essentially different from the vector space with 
origin O. Accordingly we confine our attention to a rotation 
of the co-ordinate axes about a fixed origin O. We deal 
exclusively with right-handed systems. 

Let i', j', k' be the fundamental orthonormal triad 
associated with the x'y'z' system (see Fig. 79). We recallfrom 
analytic geometry the concept of the direction cosines of a 
directed line. Let the direction cosines of the x'-axis relative 
to the x-, y-, and z-axes respectively be l u m u n t . By the 
property (10.1-19) of dot products we have 



Fig. 79. 


Likewise 



i * j = m i, 

i' • k = n. 

j' • • = h, 

J' • J = m 2 . 

j' • k = n r 


are the direction cosines of the y'-axis relative to the axes of the xyz-system. 
With similar relations for the direction cosines of the z'-axis relative to the axes 
of the xyz-system, all the relations can be conveniently and compactly exhibited 
in a table, as follows. 



i 

i 

k 

i' 

/. 

m i 

ru 

j' 

*2 

m 2 

n 2 

k' 

Is 

m 3 

n 3 


Observe that the direction cosines of the x-axis relative to the axes of the 
x'y'z'-system are l u l 2 . h and so forth. 

Now consider the vector OP, where P is any point, with co-ordinates 
(x, y, z) and (x', y', z'), respectively, in the two systems. Then 

OP=xi+yj + zk (10.3-2) 

and 

OP = x'i' + y'j' + z'k', (10.3-3) 

Now observe from (10.3-3) that x' = OP • i', y' = OP * j', and so on. If we form 
the dot product O? • i' from (10.3-2), we obtain 

x' = OP • i # = xf • i' + yj • i' + zk • i', 
or 


x' = Ux + miy + Yi\Z. 


By similar arguments we obtain the complete set of relations 

x' = l\x + miy + n\Z 
y' = l 2 x + m 2 y + n 2 z 
z’ = J 3 x + m 3 y + n 3 z. 


(10.3-4) 
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These are the equations of transformation for the rotation of the co-ordinate 
system. The inverse transformation can be found in exactly the same way, 
starting from (10.3-3). The equations are 

x = lix'+ l 2 y’ + hz’ 

y = m^' + m 2 y' + m 3 z' (10.3-5) 

z = nix'-f n 2 y' + n 3 z'. 

Consider now a vector A of the vector space with origin G. This vector will 
have components A h A 2 , A 3 in the xyz-system, and A{, A 2 , A 3 in the x'y V- 
system. Since the components of A are merely the co-ordinates of the terminal 
point of A, we see that the two sets of components are related in exactly the 
same way that xyz and x'y'z' are related, that is, 

Aj = JiAi + mjA 2 + n\A 3 

A 2 = l 2 A\ + tn 2 A 2 + n 2 A 3 (10.3—6) 

A 3 = Z 3 Ai + m 3 A 2 + n 3 A 3 , 

with an inverse set of relations corresponding to (10.3-5). The two sets of 
relations are easily kept in mind by a table similar to (10.3-1): 



A, 

A 2 

a 3 

A{ 

h 

nh 

n i 

A 2 

h 

m 2 

n 2 

a 3 

h 

m 3 

n 3 


It follows from these laws of transformation of components that if we know the 
components of a vector in one co-ordinate system, we can find its components in 
any system obtained by a rotation of the axes. 


EXERCISES 

1. Show that 

i' = hi + mj + nik, i = hi ' + J2j' + ^k', 


and obtain four other allied relations. Start from the fact that, if A is any vector, 
A = (A • i)i + (A • j)j + (A • k)k. 

2. What is the numerical value of i • (j x k)? Express this product in terms of i\ j', k' 
by the results of Exercise 1, and deduce that 


h 

m i 


h h 

nii m 3 


= 1 . 


n i n 2 n 3 1 


See Exercise 5a, § 10.2. 

3. Observe that if one solves (10.3-5) for x' by Cramer’s rule, and uses the result of 
Exercise 2, one finds 


x' = (m 2 n 3 - m 3 n 2 )x + ( n 2 l 3 - n 3 l 2 )y + (Z 2 m 3 - l 3 m 2 )z. 
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Comparing with (10.3-4), we surmise that 


m 2 

m 3 

n 2 

n 3 

h Is 



,m\ = 

h 

h 

Ml = 

n 2 

n 3 

m 2 m 3 


Prove these results directly by considering the cross product i' = j' x k'. What similar 
results are given as a consequence of the relation i = j x k? Show that all the relations of 
this type are summarized in the statement: Any element of the determinant appearing in 
Exercise 2 is equal to its own cofactor. 

4. Of what dot-product relation is the equation /?+m?+n?=l the expression? 
Answer the corresponding question for each of the equations 

M 2 + m im? + n,n 2 = 0, Ui + / 2 m 2 + h m 3 = 0. 

5. Suppose the nine direction cosines in the table (10.3-1) are as follows: 



(a) Find the components in the x'y'z'-system of the vector 14i-21j + 7k. 

(b) Find the components in the xyz-system of the vector 7i' + 28j' - 35k'. 

6. Suppose the nine direction cosines in the table (10.3-1) are as follows: 


1 

V6 

-1 

V2 

1 

V3 

1 

V6 

1 

V2 

1 

V3 

-2 

V6 

0 

1 

V3 


(a) Find V2(i + j)-k in terms of i', j', k\ 

(b) Find 1' - j' + V6k' in terms of i, j, k. 


10.4 / INVARIANTS 

The concept of invariance and of something that is invariant is most simply 
illustrated (for our purposes) by showing how the expression for the dot product 
of two vectors in R 3 (as defined in (10.1-12)) is represented in terms of the 
components of the vectors with respect to an x'y'z'-co-ordinate system obtained 
by a rotation from the jcyz-co-ordinate system used in R 3 . The relationship 
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between the components of vectors in the two systems is explained in §10.3. The 
fact that the components of A along the axes of the rotated system are A[, A$, 
A 3 enables us to write 


A = Aji' + A + A$k'. 

There is a similar representation of B. Therefore 

A • B = (Aji' + Aft + M kO * (B |i' + Bi j' + B 3 k f ). 

We can calculate this dot product by the algebraic rules governing dot products. 
Because of the fact that the vectors i', j', k' form an orthonormal system we have 
relationships such as i'*i'=l, i' • j' = 0, and so on. When we complete the 
calculations we find that 

A * B = A[B\ + A 2 B 2 + A 3 B 3 . (10.4-1) 

From the definition of A • B in (10.1-12) we now see that 

A,B t + A 2 B 2 + A 3 B 3 = A\B \ + A' 2 Bi 4- A’ 3 B' 3 . (10.4-2) 

Thus we see that the formula for A • B is the same in terms of the primed 
components as it is in terms of the unprimed components. That is what we mean 
when we say that the expression on the left in (10.4-2) is an invariant , or is 
invariant with respect to a rotation of the co-ordinate axes. We could anticipate 
this result, of course, because of formula (10.1-19), which expresses A • B in 
geometric terms, using the length of the vectors and the angle between them, 
which we are already accustomed to think of as being independent of the choice 
of a particular co-ordinate system. 

Another example of invariance is provided by the cross product of two 
vectors. By this we mean that the cross product A x B can be expressed in the 
form 

A x B = (A 2 B 3 - A$Bi) i' + (A^B j - A{B$)j' + (AJB 2 - AiB i)k\ (10.4-3) 

which has the same form as (10.2-6) except that everything (components and the 
set of orthonormal vectors) is referred to the x'y'z'-co-ordinate axes. The 
procedure used to derive (10.2-6) started from a geometrical definition of A x B, 
without the explicit involvement of a co-ordinate system. The derivation of 
(10.2-6) depended on expressing A and B as linear combinations of i, j and k and 
then making use of the algebraic rules (10.2-1), (10.2-2), (10.2-3) and the nine 
special formulas for the various cross products such as ixi, ixj, ixk, ..., as 
exhibited in §10.2. The same method will lead us to the formula (10.4-3) if we 
express A and B as linear combinations of i\ j', and k', and if we make use of the 
special formulas for the various cross products such as 

i'xi' = 0 , ixj'=k', i'xk' = -j'. 

and so on. The correctness of these formulas is apparent from the geometric 
definition of cross products. 
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We could have proceeded entirely differently, without a geometric definition 
of A x B. Let us sketch out this alternate procedure, starting with a definition of 
A x B, entirely analytically, in R 3 . This new definition, in terms of components, 
is: 


A x B — (A 2B3 — A3B2, A3B1 — A1B3, A1B2 ~ A 2 Bi). (10.4—4) 


Next, we verify that the algebraic rules 


A x B = -(B x A), (cA) x B = c ( A x B), (10.4-5) 

and 

A x (B 4- C) = (A x B) 4- (A x C) (10.4-6) 

are valid, using only (10.4-4) and the algebraic rules in §10.1. This verification is 
entirely straightforward; we shall not give the details. 

From (10.4-4) and the rules for evaluating a determinant we readily find that 


(A x B) • C = 


A\ A 2 A3 
Bi B 2 B 3 

C, C 2 C 3 


(10.4-7) 


We verify this by expanding the determinant by minors of elements in the third 
row. The components of Ax B, as given in (10.4-4), are the cofactors of C u C 2 , 
C 3 , respectively, in (10.4-7). Because a determinant is equal to zero if any row in 
it is the same as, or a multiple of, another row, it is apparent from (10.4-7) that 
(A x B) ■ C = 0 if two of the three vectors are the same of if one of them is a 
multiple of another. 

The multiplication table for the various cross products among the vectors i, 
j, k is easily worked out, using the definition (10.4-4). These results are listed in 
§10.2 (where they were obtained by use of the former method of definition). 

We turn now to consider the effect of the introduction of a new x'y'z'-co- 
ordinate system, related to the basic xyz-system in R 3 by rotating the axes. Our 
aim is to show that, if A and B have components Aj, A 2, A3 and Bj, B 2 , B 3 with 
respect to the x'y'z'-system, then A x B has the components 

MB', - A' 3 B>2 , MB[ - A[B ’ 3 , A\B' 2 - A' 2 B\ 

in the new system. If new unit vectors i', j', k' (along the positive x', y', and z' 
co-ordinate axes, respectively) are introduced, this means that 

(A 2 B3 — A 3 B 2 )1 + (A3B 1 — A]B 3 )j + (AjB 2 — A 2 B i)k = 

(A 2 B 3 - A' 3 B Ql' + (A3B1 - A[Bi)j' + (AJB 2 - A[B j)k, (10.4-8) 

so that the cross product A x B, even though defined by (10.4-4) in terms of one 
preferred co-ordinate system, is actually a vector invariant with respect to 
rotations of the co-ordinate axes. To prove this we shall first of all need to know 
some facts about i', j', k' and their relationships to i, j, k. We can express i', j', k' 
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as linear combinations of i, j, k as follows: 

i' = lii + mj + ttjk 

j' = l 2 i + m 2 j + n 2 k (10.4-9) 

k' = / 3 i + m 3 j + n 3 k. 

Also, because i', j', and k' are linearly independent and R 3 is three dimensional, 
any vector A in R 3 is a linear combination of i', j' and k', the coefficient of i' 
being A • i', and so on. In this way we can see with the aid of (10.4-9) 
that 

i= M'+ / 2 j'+ lak' 

j = m,i' + m 2 j' + m 3 k' (10.4-10) 

k = Mil' + n 2 j' + u 3 k'. 

For example, the coefficient of j' in the expression for k is k • j' = n 2 , as we can 
see from the second equation in (10.4-9). There are various relationships 
between the elements in the determinant 



h Mi 

n i 

D = 

h 

m 2 

n 2 


h 

m 3 

n 3 


(10.4-11) 


For example, li+l 2 +l 3 = l is the expression of the fact that ||i|| = 1, and lj 2 + 
mim 2 + nin 2 = 0 is the expression of the fact that i' • j' = 0. 

Now let us find the values of i' x i', i' x j', f x k', and so on. It is clear from 
the very definition (10.4-4) that 1' x i' = j' x j' = k' x k' = 0, so we can concentrate 
on i'xj', j'xk', k'xf and then obtain j' x i', etc., by using the first result in 
(10.4-5). We observe that the x' and y' components of i' xj' are (i'xj')-i' and 
(i' x j') • j', both of which are zero because, as we saw from (10.4-7), (A x B) ■ C = 0 
if any two of the three vectors are the same. The z* component of i' xj' is 
(i' x j') * k\ and this is D, as we see from (10.4-7), (10.4-9), and (10.4-11). Thus 
i'xj' = Dk'. We can find j'xk' and k' x i' by the same method. We display the 
results: 


i'xj' = Dk' 

j' x k' = Di' . (10.4-12) 

k' x i' = D j' 


We don’t yet know the value of D, but we will find it. Let us calculate j x k, 
using the second and third formulas in (10.4-10): 


j x k = (mii' + m 2 j' + m 3 k') x (n 2 i' + n 2 j' 4- n 3 k'). 

When this is worked out using the rules (10.4-5) and (10.4-6) and the results 
(10.4-12), we find 

j x k = (m 2 n 3 - m 3 n 2 )Di' + (m 3 ni - m t n 3 ) Dj' + ( m\n 2 - m 2 n0 Dk'. 
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But j x k = i = l\i f + l 2 j' + / 3 k', and therefore 

/, = D(m 2 /i 3 - m 3 n 2 ) 

1 2 = D(m 3 ni- min 3 ) 

1 3 = D(m]n 2 - m 2 ni). 

If we multiply these equations by l u h, h respectively, and add, we obtain 

1 = /1 + / 2 + / 3 = D 2 , 

because, as we see from (10.4-11), 

li(m 2 n 3 - m 3 n 2 ) + li(m 3 n\ - m\n 3 ) + l 3 (min 2 ~ m 2 n0 = D. 

But then D = ±1 and i' x j' = ±k'. We must choose the value D = 1, rather than 
D = - 1 because the x'y'z'-system, being obtained from the xyz-system by a rigid 
rotation, is right handed. 

We can now complete the proof of (10.4-8). The left side is A x B, which we 
can write as 

(Air + Ay' + A 3 k') x (B[ i' + By' + B 3 k'). 

When this cross product is expanded and the results simplified, using (10.4-12) 
(with D = 1) and the other cross-product relationships involving i', j', k', we 
obtain the expression on the right side of the equation in (10.4-8). 

In the rest of this chapter we often talk about a point of Euclidean space as 
an entity that has an existence quite apart from any co-ordinate system. We can 
use the co-ordinates (x, y, z) of a point P in some rectangular co-ordinate system, 
but we make a distinction between the point P and the number triple (x, y, z). 
This is a different point of view than that in §10.1, where the number triple was 
the point. From this new point of view Euclidean space of three dimensions is 
not the same as R 3 , in which there is a preferred point (the origin) and a 
preferred rectangular co-ordinate system. When we talk about the Euclidean 
vector space of three dimensions we have introduced the origin as a special 
point. The points are then the same as vectors, and, if we do not talk about 
components, we are talking in a co-ordinate-free mode. If we do introduce 
co-ordinates, which it is often convenient to do (perhaps only temporarily), we 
may nevertheless find it necessary to establish that certain things we are talking 
about are invariant with respect to specified changes in the co-ordinate system, 
such as rotation of the axes. It is to familiarize the student with the concept of 
invariance that we have engaged in this discussion of scalar and vector in- 
variants. 

The foregoing remarks are relevant to our discussion of gradient , divergence , 
and curl , in sections to come. 


EXERCISES 

1. Show that, if the co-ordinate system is rotated 90° about the z-axis, so that the 
x'-axis is the same as the y-axis and the y f -axis coincides with the negative x-axis, the 
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scheme (10.3-7) becomes 



A, 

A2 

A3 

a; 

0 

1 

0 

a 2 

-1 

0 

0 

A3 

0 

0 

1 


Then, show by an example that, if A and B are vectors, A ( B 3 is not a scalar invariant. 
Likewise show that the triple (AiBi, A 2 B 2 , A 3 B 3 ) does not define a vector invariant, that 
is, that in general the vector having components (AiB b A 2 B 2 , A 3 B 3 ) in the xyz-system 
does not have components (A[B\, A 2 B 2 , A 3 B 3 ) in the x'y'z'-system. Consider, e.g., A = i, 
B = i T- j. 

2. Is Aj + A 2 + A 3 a scalar invariant if A = Aii + A 2 j + A 3 k? Justify your answer. 

3. Let the nine direction cosines in the table (10.3-1) be specified as follows: 


1 

V2 

V2 

0 

1 

V3 

V3 

_ 1 _ 

V3 

_L 

V6 

-1 

V6 

-2 

V6 


Let A = i + j, B = i ~ j + k. 

(a) Calculate AxB directly in terms of i, j, k. 

(b) Calculate A and B and then A x B in terms of i', j', k\ 

(c) Reconcile the results in (a) and (b) by converting AxB from (b) to the answer in (a) 
by expressing i', j', k' in terms of i, j, k. 


10.5 /SCALAR POINT FUNCTIONS 

The word “scalar” is used to contrast with the word “vector.” It is customary, in 
any context where vectors and real numbers are both being discussed, to refer to 
real numbers as scalars. Thus, in this book, a scalar is a real number. The word 
may also be used as an adjective. 

Let us recall the general meaning of the word “function.” A function is a 
correspondence between two classes of objects; these two classes are called 
respectively the domain of definition of the function and the range of values of 
the function, or, more briefly, the domain and the range of the function. The 
function itself is the correspondence whereby to each object in the domain is 
assigned a corresponding object in the range. Let us now consider a case in 
which the domain is a class of points and the range is a class of real numbers. In 
such a case we shall call the function a scalar point function. If / denotes the 
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function, P a variable point, and u the value of the function at P, we write 
u=f(P). 

Example 1. At a fixed instant of time consider all the points in the earth’s 
atmosphere, and let u be the temperature (in degrees centigrade) at the point P. 

Example 2. On the surface of a sphere of unit radius, let Q be a fixed point. 
If P is any point on the sphere, let u be the shortest great-circle distance from P 
to Q. 

Example 3. If C is a curve in a plane, and P is any point of C, let u be the 
radius of curvature of C at P, provided the curve is smooth enough to have a 
well-defined curvature. 

The essential thing to be noted about the definition of a scalar point function 
is that it is independent of co-ordinate systems. We do not need co-ordinate 
systems to explain the concept of a scalar point function. And, if a co-ordinate 
system is used to locate various points, the function is essentially the cor- 
respondence between P and u, not a correspondence between the co-ordinates 
of P and u. There is, however, a link between a point function /(P) and certain 
functions of the co-ordinates of P. This link is the concept of invariance under 
co-ordinate transformations. Suppose, for instance, that / is a given point 
function, defined throughout some region of Euclidean three-dimensional space. 
Choose a system of rectangular co-ordinates x, y, z, and let ( x , y, z) be 
co-ordinates of P. Then the value of / at P will depend on (x, y, z), say 
/(P) = F(x, y, z), so that the scalar point function f determines F as a 
real-valued function of the three variables x, y, z. Suppose another rectangular 
co-ordinate system x', y\ z' is obtained from the xyz-system by a rotation of 
axes. When x, y, z are expressed in terms of x', y', z' by the equations of the 
form (10.3-5), F(x, y, z) is transformed into some new function G(x', y\ z'). But, 
since (x, y, z) and (x', y', z') refer to the same point P, we have f(P ) = G(x', y\ z'). 
The scalar point function f is invariant under the transformation of co- 
ordinates. We may also express /(P) as a function of cylindrical or spherical 
co-ordinates. 

Example 4. If O is a fixed point of space, consider the vector space with 
origin O. If P is an arbitrary point, denote the vector OP by R. Let A be a fixed 
vector. Then 

/(P) = A * R and </>(P) = ||R|| 2 

are scalar point functions. If we introduce a fixed rectangular co-ordinate system 
with O as origin, and let A = (A t , A 2 , A 3 ), R = (x, y, z), then in this co-ordinate 
system we have 

f(P) = A,x + A 2 y + A 3 z, 

<f>(P) = (x 2 +y 2 +z 2 ). 

It is possible to define directly the concepts of continuity and differen- 



10.51 


VECTOR POINT FUNCTIONS 


293 


tiability for scalar point functions, without reference to co-ordinate systems. 
Alternatively, however, the definitions may be made with reference to some 
arbitrarily chosen rectangular co-ordinate system. Thus, if /(P) = F( jc, y, z) in 
that system, we can say that / is continuous at P 0 if F is continuous at (x 0) yo, 
z 0 ), with a similar definition for differentiability. Although these definitions are 
made with reference to a particular rectangular co-ordinate system, they are 
actually independent of the choice of that system. For example, if F(x, y, z) = 
G(x\ y', z')> where the two systems are related by a rotation, and if F is 
differentiable, then G is differentiable also (by Theorem V, §7.3). 


10.51 / VECTOR POINT FUNCTIONS 

The concept of a vector point function is similar to the concept of a scalar point 
function in the matter of being independent of particular choices of co-ordinate 
systems. The difference is that the function values are vectors instead of scalars. 
The domain of definition of a vector point function is some set of points P. The 
range of values of the function is some set of vectors. Let f denote the function, 
and let F denote the vector corresponding to P. Then we write F = f(P) where 
f(P) depends just on P itself and not on the co-ordinates we happen to be using. 

Example 1. Let R = OP, and let A be a fixed vector. Then each of the 
expressions 

R, (A * R)R, AxR, ||R|| 3 R 
defines a vector point function. 

It is often convenient to introduce a co-ordinate system in order to deal with 
a vector point function. We shall be concerned mostly with rectangular co- 
ordinate systems. If we have an xyz-system with origin O, let i, j, k be the 
fundamental orthonormal triad associated with the xyz-system. Suppose we have 
a function F = /(P); let the components of F be denoted by F,(jc, y, z), F 2 (x, y, z), 
F 3 (x, y, z). Then 

F = F i(x, y, z)i + F 2 (x, y, z)j + F 3 (x, y, z)k. (10.51-1) 

In another rectangular system x'y'z', obtained from the xyz-system by rotation, 
F will have a different set of components, and with the triad i', j', k' for the new 
system, F will be expressed in the form 

F= F,'(x', y\ z')i' + FK*', y\ z')j' + FK*\ y\ z')k'. (10.51-2) 

The components FJ, F 2 , F 3 will be related to F\, F 2 , F 3 in the same way that A,', 
AJ, A 3 are related to A u A 2 , A 3 in equations (10.3-6). Also, the two orthonormal 
triads are related in the manner indicated by table (10.3-1). 

The value F of the vector point function is a vector invariant. Note, however, 
that an individual component of F, such as F i(x, y, z), is not a scalar invariant, 
for in general Fi(x, y, z) ^ F[(x\ y', z'). 
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If F\, F 2 , F 3 are any three functions of x , y, z, we may regard them as 
components of a vector point function F defined by (10.51-1), provided we 
understand that F t , F 2 , F 3 are the components of F in one fixed rectangular 
co-ordinate system, and that the components in any other rectangular system are 
obtained by transformations of the type (10.3-6). 

A vector point function is often called a vector field, 
especially if the domain of definition is a region in two or 
three dimensions. This terminology springs from a 
certain mode of representing the vector function. Let the 
vector F be drawn in space with its initial point at the 
point P to which F corresponds. This allows one to 
portray the manner in which the magnitude and direction 
of F vary with the location of P. The geometrical 
configuration, whether actually drawn or merely con- 
ceived, is called a vector field (see Fig. 80). 

Example 2. Consider the gravitational field of the earth. If O is the center of 
the earth, M is the mass of the earth, and r is the distance from O to an 
arbitrary point P outside the earth, then a particle of unit mass placed at P is 
attracted toward O by a force of magnitude 

k ^r, (10.51-3) 

where k is a constant of proportionality. This force can be represented by a 
vector of magnitude given by (10.51-3), and of direction opposite to that of the 
vector R = OP. The magnitude of R is r. Therefore a vector of unit length in the 
direction opposite to that of R is 






Fig. 80. 



Fig. 81. 
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Thus the gravitational force F on the unit mass at P is 

M 

F=-k-prR. (10.51-4) 

A portrayal of this vector field is suggested by Fig. 81. 


EXERCISES 

1. The equations 


V2 


y’\ 


y = 

z = 


|(x' + y') + -^z', 

V2 


- 2 (jc' + y') + ;^=z' 


define a rotation of axes. 

(a) If F(x, y, z) = lx 2 -y 2 -z 2 is the representation of a scalar point function in the 
xyz-system, find the representation G(x\ y\ z') of the function in the x'y'z'-system. 

(b) What is the table (10.3-7) for this rotation of axes? Find the xyz-representation 
F( x, y, z) for the scalar point function for which G(x\ y', z') = x' + y' + V2z'. 

(c) Find the components of the vector field F = V2zi + (y + z)j + V2xk in the x'y'z'- 
system. 

(d) Express the vector field F= V2yY + z'j' + x'k' in the xyz-system. 

2. The equations 

x' = ?(2x + 3y +6z), 
y' = 4(3 jc - 6y + 2z), 
z' = j(6x + 2y - 3z) 

define a rotation of axes. 

(a) Express x> y, z in terms of x', y', and z'. 

(b) Find the expression of x' 2 + y ,2 ~ z' 2 in the xyz-system. 

(c) Express the vector field — yi + xj in the x'y'z'-system. 

(d) Do xi + 2yj + 3zk and xT + 2yT + 3z'k' represent the same vector field? Justify your 
answer. 


10.6 / THE GRADIENT OF A SCALAR FIELD 

A scalar point function / may be thought of by imagining each point P at which 
/ is defined as carrying a label with the value f(P) of the function at that point 
(see the discussion of the second mode of representing functions in §5.4). When 
a scalar function is represented in this way, it is often called a scalar field . 

Let / be a scalar field defined throughout some region R, and suppose that / 
is differentiable in R. We are going to define the concept of the gradient of the 
field. The gradient of a scalar point function is a vector point function. It is 
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convenient to use a rectangular co-ordinate system in the process of defining the 
gradient. However, we must take care to be certain that the definition gives us a 
result which is independent of the choice of the particular rectangular co- 
ordinate system. 

Let a system of rectangular co-ordinates xy z, with origin O, be selected 
arbitrarily. 

In this co-ordinate system let P have co-ordinates ( x , y, z), and let f(P ) = 
F(x,y,z) be the representation of our point function. Form the vector field 
whose representation in the jcyz-system is 


dF. 

dX 1 


M dF . 3F _ 

+ ^r ,+ ^F k - 


( 10 . 6 - 1 ) 


This vector field, or vector point function, is what we shall call the gradient of 
the given scalar function f. 

Presently we shall give an interpretation of the gradient which enables us to 
think of it apart from the co-ordinate system. But first let us show that the 
definition of the gradient yields the same vector field no matter what rectangular 
co-ordinate system is chosen. If a second co-ordinate system is selected, the two 
systems are related in such a way that one can be obtained from the other by 
either a translation or a rotation, or both. Let us consider the case where both 
systems have the same origin, and are related by a rotation of axes. This is the 
case of principal importance for our discussion of the gradient. The case where a 
translation may be involved is considered in Exercise 13. 

Let the co-ordinates of the two systems be xyz and x'y'z', related as in 
(10.3-4) and (10.3-5), and let the two representations of the scalar field be 


f(P) = F(x, y, z) = G(x', y', z'). 


We wish to show that 


dF.3F.3F_ 3G „ 3G ., 3G , 

a^ ,+ a7 J + ir k = to 7 ' + a? k ' 


( 10 . 6 - 2 ) 


Once this is done, it will be clear that the definition of the gradient of f by 
expression (10.6-1) is invariant under a rotation of the axes. 

/ 3 F 3 F 3 F \ 

To prove (10.6-2) we must show that the triple * s re l ate d to 

VT’TT’Tt) just as the triple (A u A 2 , A 3 ) is related to the triple 
ox dy oz } 

(Ai, A 2 , A 3 ) in (10.3-6) or (10.3-7). Now, by the chain rule, 

dG __ dF dx dF_d y dF dz 
dx' ~ dx dx' + dy dx' + dz dx’’ 

d(j d(j 

with similar equations for and — 7* From (10.3-5) we see that 

3y oz 



10.6 


THE GRADIENT OF A SCALAR FIELD 


297 


Therefore 


dG . ^ ^ .. 

- h — + mi — + n x — 


aF 

dX 


dF 
1 ^ 


dF 

dz 


This is exactly like the first equation in (10.3-6). The other two relations of this 
type are established in the same way, and thus (10.6-2) is proved. 

Frequently the value of a scalar function is denoted by a dependent variable, 
e.g., u = f(P). In such cases the gradient is denoted by either of the symbols 


grad u or grad /. 

The symbolism Vu or V/ is also employed. Thus, in any rectangular co-ordinate 
system xy z, with corresponding orthonormal triad i, j, k, 


• n dll . du , du , / t 

gradu =Vh = — 1 + — J + — k. (10.6-3) 

Example 1. If 

u = 2x 2 + 3y 2 + 4z 2 , 

Vm = 4xi + 6yj + 8zk. 

Let us now consider some properties of the gradient. 


THEOREM I. At any given point in the region where the differentiable scalar 
field u = f(P) is defined let a direction be chosen. The rate of change of u 
( per unit distance ) at the given point in the given direction is equal to the 
component of the gradient of u in that direction. If grad u=t 0 at the point , the 
direction of the gradient is that in which the rate of increase of u is greatest. 


Proof. Let a rectangular co-ordinate system be chosen so that the given 
direction is that of the positive x-axis. The rate of change in the given direction 
is then du/dx. But this is precisely the component of Vm in the x -direction, by 
(10.6-3). The first assertion in the theorem is now justified. The correctness of 
the second assertion is an immediate consequence, since, of all components of a 
non-zero vector, the largest one is that in the direction of the vector itself. 

The rate of change of a function at a given point in a given direction is called 
the directional derivative. 

The importance of Theorem I is that it enables us to see the significance of 
the gradient without any reference to co-ordinate systems. A further grasp of the 
significance of the gradient is obtainable with the aid of the concept of level 
surfaces (see the latter part of §5.4). Let us assume that the gradient is a 

continuous vector function, i.e., that are continuous functions. Let P 0 

dX dy dz 

be a point at which these three partial derivatives are not all zero; this is the 
same as requiring that Vut 6 0 at P 0 . If u 0 is the value of u at P 0 , it can be shown 
by the implicit-function theorem (§8.1) that, at least in some sufficiently small 
neighborhood of P 0 , the points at which u = u 0 form a surface having a tangent 
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plane whose normal varies continuously in direction as the point of tangency 
varies. This smooth surface is part of the level surface u = u 0 . 

THEOREM II. At any point where the gradient is not the vector 0, the gradient 
is perpendicular to the level surface through the point. 


Proof. Let the representation of the scalar field be u = F(x,y,z). The 
equation of the level surface is F(x,y,z)= u 0 . But we know (Example 1, §6.6) 

dF dF dF 

that the direction of the normal to the surface is given by the ratios — : — : — . 

J dX dy dZ 

These partial derivatives are precisely the components of the gradient, and so 
the theorem is proved. 

Example 2. Consider the scalar field u = r 3 , where r is the distance OP (O 
fixed, P variable). Clearly u is constant on a surface if and only if r is constant 
on that surface. Thus the level surfaces are spheres with center O. Evidently u 
increases when r increases, the rate of increase being 



It follows from Theorem II that Vu is normal to the sphere at P. From Theorem I 
we see that the direction of Vu is outward from the sphere at P, and that its 
length is 3 r 2 . Now a unit vector in the direction of OP is 



Therefore 


Vu = 3r 2 



= 3rOP 


in this case. This example illustrates how Theorems I and II sometimes furnish 
the means of calculating a gradient without resorting to rectangular co-ordinate 
systems. For a solution of this same problem in terms of rectangular co- 
ordinates see Exercise 5. 


EXERCISES 

1. Find Vf in the case of each of the following functions. 

(a) f(P ) = 2x 2 - 3xy + y 2 - 4 xz + 6 z 2 . 

(b) f(P) = x/(x 2 + y 2 ) + z/(y 2 + z 2 ). 

(c) f(P) = log(x 2 + y 2 ) + z. 

(d) /(P) = z/(x 2 + y 2 + z 2 ) 3 ' 2 . 

(e) f(P) = e~*(x 2 + y 2 + z 2 ). 

2. In what direction from the point (2, -1,3) is the function xy + yz + zx + xyz 
increasing most rapidly? Give direction cosines for the answer. 
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3. Find a vector indicating the direction and magnitude of most rapid decrease of 
the function (x 2 /16) + (y 2 /25) - ( z 2 /9 ) at the point (8, 25, -9). What is this rate of decrease? 

4. (a) How fast is 4x 2 + 9y 2 + z 2 changing at the point (1, 1, 1) in a direction tangent 
to the ellipsoid 4x 2 + 9y 2 + z 2 = 14? (b) What is the rate of change in the direction of the 
outward normal to the ellipsoid at the point? 

5. Find Vu by use of (10.6-3) if u = (x 2 + y 2 + z 2 ) 3/2 . Compare with Example 2, 
noting that r 2 = x 2 + y 2 + z 2 and OP = x i + y j + zk. 

6. Find Vu if u = x 2 , using Theorems I and II rather than (10.6-3). Check by using 
the latter formula. 

7. Let R be the vector OP and r the distance OP. Find the gradient of each of the 
following functions by the method of Example 2. Express the answers in terms of R and 
r. 

(a) u = 1/r; (d) u = log(I/r); 

(b) u = r 2 ; (e) u = r"; 

(c) u = 1/Vr; (f) u = e~ r \ 

8. In the notation of Exercise 7 let u = F(r), where F is a differentiable function of 

r. Show that Vu = (l/r)F'(r)R: (a) by the method of Example 2; (b) by using (10.6-3) 

with r 2 = x 2 + y 2 + z 2 , R = xi + y j + zk. (c) Find the most general form for F(r) if Vu = 
-r 3 R. 

9. Let r denote the distance from P to a fixed line in space, (a) If /(P) is a 
function of r alone, what are the level surfaces of the scalar field? (b) If /(P) = F(r), 
where F is differentiable, what can you say about the magnitude and direction of Vf at P 
if P is not on the given line? (c) Let the fixed line be the z-axis. If P is the point 
(x, y, z), show that Vf = [F'(r)/r](xi+ yj). 

10. (a) If u = f(P) and v = g(P) are differentiable scalar fields, show that V(nt;) = 
uVv + vVu. (b) Use this result to find V(x/r 3 ). 

11. Prove Theorem I along the following lines: Let n be a unit vector in a certain 
direction at the point P 0 (x 0 , yo, z 0 ). If the direction cosines of the direction are cos a, 
cos (3, cos y, then 

n = cos ai + cos /3 j + cos yk. 

If s is distance measured positively along the line through P 0 in the direction of n, with 
5 = 0 at P 0 , the line has parametric equations 


x = x 0 + s cos a, y = y 0 + s cos j8, z = z 0 + s cos y. 


Show that (Vu) ■ n, and explain why this formula is equivalent to Theorem I. 

12. If (j) is colatitude and 6 is longitude (both in radians) in a system of spherical 
co-ordinates related to rectangular co-ordinates by the equations x = r sin $ cos 0, y = 
r sin (j) sin 0, z = r cos 4>, what are (a) the level surfaces for u = <£? (b) the level 
surfaces for u = 0? 

(c) Show that 


r, , cos 0 cos <5 . , sin 0 cos <f> . sin , 
V(f> = — i + — j — k. 


This may be done either by using (10.6-3) or by careful use of Theorems I and II 
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(d) Show that 

T 7 n sin 6 . , cos 0 . y . , x 

rsm</> rsin4> J x +y x +y J 

The remark in (c) applies here also. 

13. Show that (10.6-2) holds if the xyz-system and the x'y'z'-system are related by a 
translation. In this case we have i = i', j = j', k = k', since the vector space with origin O' is 
considered to be indistinguishable from the vector space with origin O (see §10.1). Then 
discuss the case where both a translation and a rotation are needed to pass from one 
system to the other. 


10.7 /THE DIVERGENCE OF A VECTOR FIELD 

Let 


F = F { (x , y, z)i + F 2 (x, y, z) j + F 3 (x, y, z)k (10.7-1) 


be a vector field with differentiable components. We remind the student that F t , 
F 2 , F 3 are the components of F in the xyz-co-ordinate system. In another 
rectangular co-ordinate system there will be another set of components. We also 
observe that the subscripts 1, 2, 3 do not refer to partial differentiation, as they 
did in Chapters 6 and 7. The subscripts are simply labels on the separate 
functions. 

The expression 


dFi dF 2 dF 3 
dx dy dZ 


( 10.7—2) 


is of interest in relation to the vector field, for it has an important physical 
meaning in certain kinds of vector fields occurring in physical theories. It also 
occurs in a very important purely mathematical theorem which we shall study 
later (the divergence theorem, §15.6). The importance of the expression (10.7-2) 
is closely connected with the fact that it is a scalar invariant , in the following 
sense: the expression (10.7-2) is unchanged in value if we make any rigid motion 
of the axes (see formula (10.7-4), further on). We shall deal with the case of a 
rotation of axes, leaving the simpler case of a translation for the student. Let the 
notation for the vector field in the rotated system be 


F = Fi(x\ y', z')i' + F 2 (*', y \ z')j' + FJ(x'. y', z')k # . 

Then, by (10.3-6) we have 

Fi = fiF, + m x F 2 + n x F 3 , (10.7-3) 

and other similar formulas. The xyz-system is related to the x'y'z'-system by 
equations (10.3-4) or (10.3-5). Now, by (10.7-3), 
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Also, by the chain rule, 


dF 1 _ dF 1 dx dF 1 dy dF i dz 
dx' dX dx' dy dx' dz dx' 

i ^Fi. dFi , dFi 


with similar formulas for and 77 — 

dX dx 


In the same way. 


dF’, . dF , 


is7='^ + "^ 

d -fi = . h ^i +mi d -Ei 

dy dx dy 


dFi . dFi 


and so on. The student should himself write out all the formulas. From — -r we 

dX 

obtain nine terms: 

dF\ |2 aF, . dFi . dF, 

. / *F 2 2 dF 2 , dF 2 

+ mdi -7TT+ Wl m l^l "71" 


, dF 3 dF 3 2 dF 3 

5F? dF 3 

The formulas for —7 and - 7-7 are obtained by advancing the subscripts to 2 and 
dy dz 

3 respectively on 1, m, and n. Now 

I?+/?+F= 1 , 


l\tni + l 2 wi2 + hm3 = 0, 


and so on. Hence it may be seen that 


dF{ , dF 2 , dF 3 _ dF 1 dF 2 dF$ 
~~dx'^~dy 7 dz' ~ dx dy dz' 


(10.7-4) 


This proves the invariance of (10.7-2). 


Definition. The scalar function (10.7-2) is called the divergence of the vector 
field (10.7-1). It is denoted by div F: 


divF =ip + f^ + ^ 

dX dy dZ 


(10.7-5) 


Observe that divF is a scalar field associated with the vector field F. We 
shall not try at this point to display the mathematical or physical importance of 
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the concept of divergence. We remark merely that certain kinds of vector fields 
have the property that their divergence is everywhere zero. Such fields are called 
solenoidal. Among the most important solenoidal vector fields are those fields of 
force which are produced by the inverse square law of attraction or repulsion, 
e.g., gravitational or electrostatic fields. Also, if F denotes the velocity field of an 
incompressible fluid in a steady state of flow, the incompressibility is expressed 
by the fact that div F = 0. 

Example L The electrostatic field produced by a unit positive charge at O is 

E = p OP, 

where OP = r. We shall show that the divergence of this field is zero wherever 
the field is defined (at all points except O). In a rectangular co-ordinate system 
let P have co-ordinates x, y, z. Then 

r 2 = x 2 + y 2 + z 2 , OP = xi + y j + zk. 


Hence the components of E are 


Now 


E\ -p, &2- p> U 3 - ■ 


To find we have 
dx 


Hence 


By symmetry, 


therefore 


3 'y 2 Ul 

dEi r ~ 3rX ^ 
dx r 6 


^ dr ^ dr x 

2r — = 2x, — = - 

dx dx r 


dE ] _ r 3 - 3 rx 2 _ r 2 - 3x 2 
lx ” r 6 ~ r 5 


dE 1 = r 2 - 3y 2 dE 3 _ r 2 -3z 2 . 
dy r 5 ’ dz r 5 ’ 


3r 2 - 3(x 2 + y 2 + z 2 ) _ 


div E = 


0. 


There is another common notation for the divergence; it is 

div F = V • F. (10.7-6) 

The reason for this notation is one of formal appearance. The symbol V was 
introduced in the notation for the gradient (see (10.6-3)). The symbol V by itself 



10.7 


THE DIVERGENCE OF A VECTOR FIELD 


303 


is sometimes expressed as 


dx dy dz 


In this form V (which is read as “del”) is called a vector differential operator. We 
say that the components of the operator V in this particular co-ordinate system 
are 


dx ’ dy ’ dz 

Recalling the formula 

A • B = A\B\ + A 2 B 2 T A 3 B 3 

for the dot product of two vectors, we see that appearances would lead us to 
write 


V-F = 


Af+—f+— 
dx F, + dy F2+ dz 


Fy 


The expression on the right here is in fact div F, if the “product” of the symbols 

d dF 

— Fj is understood to mean the derivative — — - and so on. We thus have a 

uX uX 

justification of the notation (10.7-6). 

Particular interest attaches to the divergence of the gradient of a scalar field. 
If the scalar field is u- f(P) we see that 

div(gradH) = |p- + |p- + |p-- (10.7-8) 

In the V notation 


div(grad u) = V * Vu. 

It is customary to write V • V = V 2 , so that 


V 2 u = 


d 2 U , d 2 U , d U 


dx 2 dy : 


dz ' 


(10.7-9) 


The left member of (10.7-9) is read as “del-squared of w,” or “del-squared w.” 
The equation 


d 2 u + d 2 u ~ t d 2 u 
dx 2 dy 2 dz 2 


is of fundamental importance in many branches of applied mathematics. It is 
known as Laplace's equation , in honor of the researches of the famous French 
mathematician Pierre Simon de Laplace (1749-1827). Accordingly the expression 
V 2 u is often called the Laplacian of u. 

Since the gradient and the divergence are both invariants with respect to 
rigid motions of the axes, it follows from (10.7-8) that V 2 u is a scalar invariant 
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associated with the field u. In other words, 


d 2 u d 2 u d 2 U d 2 u d 2 u d 2 U 

dx' 2 dy 12 dz’ 2 ~ dx 2+ dy 2 dz 2 ’ 

if the x'y'z'-system is obtained from the jcyz-system by a rigid motion. 

Example 2 . The electrostatic potential at P arising from a dipole of unit 
strength at O, oriented in the direction of the unit vector n, is 


V = 


n • OP cos 6 


where r = OP (see Fig. 82). The electrostatic field itself is 
E = -VV. Let us calculate V*E=-V 2 V. We choose a co- 
ordinate system with origin O so that n coincides with the 
direction of the positive z-axis. Then 

n = k, OP = xi + y j + zk, r 2 = x 2 + y 2 + z 2 . 

Hence n * OP = z, and 



Fig . 82. 


Then 


3V = _ 32r -4iL 
dx dx 


V = -r 

r 


~ dr 

2r ^ =2x > 


dV 

dx 


3 xz 

„5 


Likewise, by symmetry, 


Next 


dV = ~3yz 
dy r 5 

- .3 - )Zr „5 


dV 

dZ 


dz 


d 2 V 

dx 2 




d 2 V 


and by symmetry we can write down a similar expression for Finally, 


d 2 V 

dz 2 


~ -A ui kjx* . - ~f\ 3 r 

= — 3r 3z 2 5r 6 — 


6z 
dz r 


dz 


3 z 6z 15z 3 
r 5 r 5 r 1 


Collecting, we see that 


= — 3z 




5(x 2 + y 2 + z 2 )\ 


- 
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EXERCISES 

1. Show that, if u and F are differentiable scalar and vector fields, respectively, then 

V * (wF) = Vu • F + mV • F. 

2. Show that, granted sufficient differentiability, 

V • (uVv) = uV 2 v + Vu ■ Vu. 

3. (a) Let F = zi. If this vector field is interpreted to mean that F at (x, y, z) is the 
velocity of the particle of a fluid at that point, describe in words the nature of the motion 
of the fluid. Find V • F. What is happening to the volume of the part of the fluid which at a 
given instant occupies the cube with sides x = 0, x = 1, y = 0, y = 1, z = 0, z = 1? 

(b) As in (a), interpret F=xi+yj+zk as the velocity at (x, y, z) in a fluid flow. What 
is the nature of the motion? Find V • F. Let V be the volume of the part of the fluid which 

at t = 0 occupies the sphere of radius r and center O. Find — and ~~ in terms of r and 

1 dV 

show that — = V • F. This special situation gives some hint of the meaning of V • F in 

relation to expansion or contraction of volumes in an arbitrary fluid flow. 

4. (a) In Exercise 8, §10.6, it was shown that for a scalar field of the form u = F(r), 

Vu = i F'(r)R where r = OP and R = OP. Using this result and the formula in Exercise 1, 
show that 

V 2 u = F"(r) + - r F'(r). 


(b) Find F(r) if V 2 u = 0 when r > 0. 

5. (a) Using the notation r and R as in Exerxise 4, let ¥=<j>(r)R and show that 
V * F = r<f>'(r) + 3 (b) Find (j>(r ) if V • F = 0 when r > 0. Show that in this case F can be 
interpreted as a force of attraction toward, or repulsion from, O according to the inverse 
square law. 

6. (a) If F = A x OP, where A is any constant vector field, show that V • F = 0. 
Suggestion: Choose axes so that A has the direction of the z-axis. This is not essential, 
but it simplifies the work. 

(b) If a rigid body is rotating around the z-axis with angular speed o>, rotation in the 
xy -plane being in the counterclockwise sense as viewed by an observer looking down on 
the plane from the positive z-axis, the velocity of any point P of the body is V = o>k x OP. 
Verify this. Write out the components of V and compute V • V. 

7. If u = ax 2 + by 2 + cz 2 , find V 2 u. If u is a general homogeneous polynomial of 
degree two in x, y, z, what is the condition on its coefficients necessary and sufficient to 
insure V 2 u = 0? 


10.8 /THE CURL OF A VECTOR FIELD 

We have seen that div F is a scalar invariant which appears formally as the dot 
product V • F of the vector operator symbol V and the vector F. Let us now ask 
what we obtain formally from the cross product V x F. By analogy with (10.2-7) 
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we write 


In expanded form 


V x F = 


d_ 

dx 


F, 


j k 

d_ d_ 
dy dz 

F 2 f 3 


Vxf = 


(3F, 
V dy 


/aF, 

8Z ) \ dZ 



3F 2 

dX 


dF\\ 
dy ) 


k. 


( 10 . 8 - 2 ) 


The first question about this expression is: Is it invariant with respect to rigid 
motions of the axes ? The student who has thoroughly understood the dis- 
cussions of the invariance of the gradient and the divergence will appreciate the 
import of the present question; he will also be ready to carry through the proof 
of invariance of VxF on the basis of the experience he has acquired. In §10.7 
we have seen how to deal with the transformation of partial derivatives of the 
components of F under a rotation of the axes. We have to prove three relations, 
of which the relation 


dFj_dFj =1 / dFj 
dy f dz' l \dy 




dFA 
dy J 


(10.8-3) 


aFi_ J dF' 2 


is typical. To verify (10.8-3) we calculate -^7 and 


dz' 


by the same methods used 


dp \ 

to calculate -—7 in the discussion that follows (10.7-3). The result thus obtained 
dX 

for the left side of (10.8-3) can be matched up with the right side by making 
suitable replacements for l u m 1 , n We can replace 6 by m 2 n 3 — m 3 n 2 because of 
the fact that i' = j' x k', when we look at the components of these latter vectors in 
the xyz-system. There are corresponding replacements for mi and n } . We leave it 
to the student to complete the work. 

The expression VxF defined in (10.8-2) is a vector invariant. Its importance 
is on the same level as that of the divergence, and for much the same reasons. 
We call V x F the curl of the field F, and write 


curl F = V x F. 


The physical or geometrical significance of the curl cannot be explained in a few 
lines. Certain kinds of vector fields have the property that their curl is every- 
where the vector 0. Such fields are called irrotational. This name comes from 
hydrodynamics, and is there used to describe the vector field of velocities of 
particles in certain types of fluid motion. The antithesis of irrotational fluid 
motion is vortex motion, of which a whirlpool affords the simplest example. In 
the physical applications to fields of force, the irrotational fields are the con- 
servative fields, i.e., the fields in which a principle of conservation of energy 
applies. 
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Example 1 . Consider the. vector field V = mk x OP representing the velo- 
cities of points P in a rigid body rotating about the z-axis (see Exercise 6(b), 
§10.7). We have 


V = 


i j k 

0 0 co 


x y z 


— rnyi + coxj. 


Here V\ = - coy, V 2 = cox, V 3 = 0. Hence 


Vxy = 


i 

_ d _ 

dx 


-coy 


j k 

_d_ d_ 
dy dz 9 
cox 0 


V X V = 2cok. 


Thus the curl of the velocity field is a vector in the direction of the axis of 
rotation, having a magnitude equal to twice the angular speed of rotation. 


Example 2. Let F = p OP, where c is a constant and r is the distance OP. 

This may be interpreted as the gravitational field of a mass concentrated at O 
(compare with Example 2, §10.51). Show that 

V x F = 0. 

The components of F are 


__ cx 

Fi ~7’ 


cy 


v - cz 

^3 ~ ~pr’ 


and r 2 = x z + y 2 + z 2 . Now 


dF 2 ( - - 4 dr 

iz =cy \- 3r * 


cy(~3r~ 4 ) j = 


3cxy 

„5 


By a symmetrical calculation 

dF; _ 3cxy 
dy ” r 5 

Hence the z-component of V x F is 


dx dy 


The other components also vanish, by symmetry. 


EXERCISES 

1. If F = Vu, show that VxF = 0 at all points of the field. Assume that u is twice 
differentiable. 

2. Show that V ■ (V x F) = 0 in the case of any twice differentiable vector field. 
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3. Find V x F in each case. 

(a) F = 2xzi + 2yz 2 j + (x 2 + 2y 2 z - l)k. 

(b) F = axi + by j + czk. 

(c) F= yi + zj + xk. 

, xn F _*i + yj 

(d) 

(e)F = ^l- 


(f) F = - 


y 

xz 


V x*+y 2 


yz 


Vx 2 + y 5 


j + VP+Tk. 


4. If u and F are differentiable scalar and vector fields, respectively, show that 
V x (hF) = u(V x F) + Vw x F. 

5. If E and F are differentiable vector fields, show that 


V • (E x F) = F • (V x E) - E • (V x F). 

6. If F = A x OP, where A is a constant vector field, show that V x F = 2A. 

7. Let R = OP, r = ||R||. (a) Find V x R. (b) If F = </>(r)R, where <p is differenti- 
able, show that V x F = 0. 

8. If A is a constant vector field, and r, R have meanings as in Exercise 7, show that 
V x (r" A x R) = (n + 2)r"A - nr n ~\ A • R)R. 

9. Write out all the details of the proof of (10.8-3). 


MISCELLANEOUS EXERCISES 

1. Let A and B be constant, and let R = OP, E=R-A, F = R-B. Show 
that (a) V • (E x F) = 0, (b) V x (E x F) = 2(B - A), (c) V(E • F) = E + F. 

2. Let R = OP, r = ||R||, and let A be a constant vector. 

(a) Find V * r n A) and V x (r n A). 

(b) Find V ■ (r"A x R). 

(c) Show that = -V x (^r^). 

3. With the notation of Exercise 2, and the additional assumption that A has unit 
length, show that 

(a) V • [(A • R)A] = 1, (c) V • [(Ax R) x A] = 2, 

(b) V x [(A • R)A] = 0, (d) V x [(Ax R) x A] = 0. 

4. Find V • (Vu x Vu). 

5. If A is a constant vector, show that V • (A x F) = - A * (V x F). 



1 1 / LINEAR 
TRANSFORMA TIONS 


11 / INTRODUCTION 

The development of vector mathematics presented in the preceding chapter took 
place mostly during the second half of the nineteenth century. One of the leading 
contributors to the development was the American genius Josiah Willard Gibbs 
of Yale University. The first part of Chapter 10, which deals with the ways 
vectors combine with other vectors and with scalars (real numbers), is called 
vector algebra. During the twentieth century, vector algebra expanded vastly 
into a subject called linear algebra, which has important applications in much of 
mathematics, especially advanced calculus. 

This chapter will present enough linear algebra to enable us to use the 
subject to unify and extend the results obtained in the last several chapters. 
Prerequisite to full understanding of this chapter and the next is some knowledge 
of simultaneous linear systems involving n equations in n unknowns. In parti- 
cular, we shall use the following fact: A necessary and sufficient condition that 
such a system have a unique solution is that the determinant of the coefficient 
matrix be different from zero. In Chapter 12 we occasionally use some of the 
most elementary rules for computing determinants. 

A big step in the transition from vector algebra to linear algebra is the 
realization that functions from a vector space to a vector space are in them- 
selves examples of vectors. The first part of this chapter will be devoted to 
explaining in detail what this means. We can begin with the once popular 
question, “What is a vector?” The common reply used to be that a vector is a 
quantity having both magnitude and direction. We shall soon see that neither 
magnitude nor direction is essential to vectors, and that most things having both 
(trains, for example) are not vectors. No satisfactory answer was arrived at until 
it was realized that one should not try to define a vector as an object having 
certain qualities, but rather as a member of a family of objects governed by 
certain rules. In particular, the important considerations are how to combine a 
vector with other vectors and with scalars through certain operations. 

The problem is much like that of defining a checker. We might naturally begin 
by saying that it is a wooden or plastic disk, but such a beginning does not lead to 
anything satisfactory. If one were asked whether the top of a soft drink bottle is 
a checker, one’s first inclination would probably be to say no. Yet many games 
of checkers have been played with these objects. The important fact which 
emerges from all this is that there is no property intrinsic to a thing considered in 
isolation which can settle the question of whether it is or is not a checker. The 
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answer depends entirely on whether it is a member of a set of objects which are 
moved according to certain well-defined rules. 

Similarly, a mathematical object is a vector if and only if it is a member of a 
set whose members combine with each other and with real numbers according to 
certain well-defined rules — the rules of vector algebra. These rules have already 
been encountered in our discussion of R 3 in §10.1. We shall collect and present 
them here as a set of abstract rules. The advantage of this is that it will enable us 
to apply them to sets of objects which we would be slow to recognize as vectors 
if we continued to associate vectors with physical or geometrical quantities. The 
rules actually apply to two sets — the set of real numbers R, whose elements we 
shall refer to as scalars, and the set Y of vectors. In stating the rules we shall 
distinguish between the scalars and the vectors by using boldface type to refer to 
the latter. 

On the set of vectors Y, there must be a binary operation, denoted by + and 
called addition, which is commutative and associative, i.e., if x and y belong to 
Y, then their sum x + y also belongs to Y and 

x+y =y+x 

and x + (y + z) = (x + y) + z 

for all x, y, and z in Y. 

There is a zero vector, denoted by 0, with the property that 
x + 0 = x for all x in Y. 


The fact that this vector is denoted by the same symbol as the real number zero 
means that the student must be alert from now on to infer the correct meaning 
from the context in which the symbol occurs. 

For each vector a in T, there is a unique vector x such that a + x = 0. This 
vector x is denoted by -a, of course. 

There is an operation called multiplication of vectors by scalars, which is 
denoted merely by juxtaposition and which has the following properties. 


ex = xc is in Y 
c(x + y) = cx + cy 
(a + b)x = ax+ bx 
(ab)x = a(bx) 
lx = x 
Ox — 0 


for all x in Y and all c in R. 

for all x and y in Y and all c in R. 

for all a and b in R and all x in Y. 

for all a and b in R and all x in Y. 

for all x in Y. 

for all x in Y. 


Notice also that neither multiplication nor division of one vector by another 
is necessarily defined. In many applications of vectors it is convenient to define a 
dot product (frequently called a scalar product) like that in Chapter 10. And in 
the study of three-dimensional space the vector product has been seen to be 
useful, but no concept of multiplying one vector by another is essential in 
defining vectors, and nowhere will we have occasion to define division for 
vectors. 
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We pause here to recall the use of the set membership symbol E, already 
introduced in §2.7. If is any set, the notation aE^ means “a is a member (or 
element) of 5P.” It can be variously read as “a belongs to 5^,” or “a is in #” 
Other slight variations of the verbal rendering of the notation are useful. For 
instance, we may write one of the foregoing rules about vectors as follows: 
( ab)x — a(bx) for each a, b E R and each x E V. In this context E may be read 
as “in” or “belonging to.” 

The entire mathematical structure made up of T, R, the two operations of 
vector addition and multiplication of a vector by a scalar, and the rules 
governing them is properly called a vector space. To avoid prolixity, however, 
this term is frequently applied just to the V when the rest of the structure is 
understood. The expressions linear space and linear vector space are frequently 
used as synonyms for vector space. 

Example 1. Let V denote the set R" of all ordered n-tuples of real numbers, 
with addition and scalar multiplication defined as follows: If x = (x u x 2 , . . - , x„) 
andy = (y,,y 2 , ...,y„) then 

x + y = (xi + y u x 2 T y 2 , . . . , x„ + y„), 

and if c is any scalar, cx = (cxi, cx 2 , . . . , cx n ). It can easily be verified that V 
together with these operations constitutes a vector space. We discussed R" in 
§10.12. Remember that the zero vector 0 is (0, 0, ... , 0). In Chapter 12 we shall 
study functions with domain in R" and range in R m , where n and m may or may 
not be the same. 

Example 2. Here, V is the set of all real-valued continuous functions defined 
on [0, 1]. The reader already knows how to add two functions and how to 
multiply one by a scalar. He also knows that these operations, performed on 
continuous functions, always give continuous functions. We merely wish to 
point out that this long-familiar structure is a vector space, even though we have 
said nothing about “magnitude and direction” of the vectors. 

Example 3 . Let V stand for the collection of all 2x2 matrices. We add 
matrices simply by adding the elements in corresponding positions, and in order 
to multiply a matrix by a scalar, we multiply each element of the matrix by that 
scalar. Even though we have not associated any magnitude or direction with 
these matrices, they are nonetheless vectors. 

An explanation of why R n is called n -dimensional was given in §10.12. For 
the general (abstract) case we call a vector space finite-dimensional if there 
exists a finite set of vectors ui, . . . , u„ such that every vector x can be 
represented in exactly one way as a linear combination of the U;’s: 

X = CiUi + * ■ • + C n u n . 

The uniqueness of representation means that the coefficients C\, . . . , c n are 
uniquely determined by x. This implies that the c ' s are all 0 if x = 0. It also 
implies that no u ( is 0, for if some Uj = 0, the choice of C\ cannot be uniquely 
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determined by x. The set of vectors m, . . . , u„ is called a basis for the space. In 
linear algebra it is proved that if m, . . . , u„ is a basis and vi, . . . , v m is a basis, 
then n = m. The common value of n and m is defined to be the dimension of the 
space. There are vector spaces that are not finite dimensional. Such a space is 
said to be of infinite dimension. The space in Example 2 is infinite dimensional. 
The space of matrices in Example 3 is of dimension four. One possible basis 
consists of the four matrices exhibited here: 

Vo or Vo or Vi or Vo 1 r 

11.1 / LINEAR TRANSFORMATIONS 

In Example 2 the vectors are functions — continuous functions from the unit 
interval [0, 1] to the real numbers. In this section, we introduce another very 
important kind of function space. The vectors in these spaces are the very 
simple functions called linear transformations. By a linear transformation we 
mean a function T from a vector space Y to a vector space SP having the 
following two properties: 

1. T(x) + T(y) = T(x + y) for all x and y in Y, and 

2. T(c\) = cT(x) for every scalar c, and every xEY. 

Notice that from (1) it follows that T(0) + T(0) = T(0), whence T(0) = 0. This is 
an important fact. The same result can be deduced from (2). 

Very often, for convenience, we shall write Tx instead of T(x), omitting the 
parentheses. This is not done for all functions, but is a common practice with 
linear transformations. 

The statement that / is a function from the set A to the set B is frequently 
abbreviated to / : A -» B. It means that / is defined at every point of A, and that 
for each a in A, f(a) E B. In other words, the domain of f is all of A, but its 
range may be either all of B or just some proper subset. Therefore, when we say 
that T is a linear transformation, we shall imply that T: Y -> £P for some pair of 
vector spaces Y and SP, and that T satisfies Properties (1) and (2). The terms 
linear function and linear map are frequently used as synonyms for linear 
transformation. 

In the important special case where Y = SP, we shall follow the practice, in 
this book, of referring to the linear transformation as a linear operator . In other 
words, a linear operator, in our terminology, is a linear transformation which 
maps a vector space into itself. 

If Y and 5 f are two vector spaces for which there exists a linear trans- 
formation T: Y^SP such that T maps Y in a one-to-one manner onto all of SP, 
we say that Y is isomorphic to 5 P. The property of one-to-oneness signifies that if 
x^y, then Tx^ Ty. Therefore T has an inverse, say L , from SP to Y, and since 
the inverse of a linear transformation is linear (Exercise 3) it follows that if one 
vector space is isomorphic to a second, then the second is isomorphic to the 
first, so it is sufficient to say simply that the two spaces are isomorphic. 
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Two vector spaces can be isomorphic without being identical. For example, 
the collection of all polynomials of degree not exceeding n — 1: 

p(t) = Co+ C\t + c 2 t 2 + ■ • • + c n -it n 1 


is a vector space if we add polynomials and multiply polynomials by scalars in 
the usual way. This vector space is isomorphic to R" (see Example 1 in the 
preceding section) by virtue of the mapping which associates p(t) with the 
vector (c 0 , c i, . . . , c„_i) in R n . As far as algebraic properties are concerned, two 
isomorphic vector spaces are indistinguishable. 

11-2 / THE VECTOR SPACE ^(R",R m ) 

For reasons connected with its role in abstract algebra, the set of all linear 
transformations from R" to R m is denoted by i£(R", R m ). We wish to prove that 
this set is a vector space. We define addition the same way we always have for 
functions; that is, if T x and T 2 belong to i?(R", R m ), then their sum T is that 
function from R" to R m defined by 

T(x) = Ti(x) + T 2 (x) forallxGR". 

T is obviously then a well-defined function from R" to R m , but to show that it 
belongs to i?(R", R m ) we must prove that it is linear. We therefore first verify 
Property (1). 

T(x + y) = Tj(x + y) + T 2 (x + y) = T,(x) + Tfy) + T 2 (x) + T 2 ( y) 

= [T r (x) + T 2 (x)] + [T,(y) + T 2 (y)] = T(x) + T(y). 

Property (2) is checked as follows: 

T(cx) = T,(cx) + T 2 (cx) = cT x (x) + cT 2 (x ) = c[T x (x) + T 2 (x)] = cT(x). 

This completes the proof that if Tj and T 2 are any two elements of 
i?(R n , R m ), their sum must also belong to this set. It is even easier to prove that a 
scalar multiple of a linear transformation is a linear transformation. The other 
vector space axioms are obviously satisfied. The zero vector in this space 
«3?(R", R m ) is the linear transformation which maps every vector of R n into the 
zero vector of R m . The reader should verify that this function is linear. 


11.3 / MATRICES AND LINEAR TRANSFORMATIONS 

It is assumed that students using this book have some familiarity with the 
algebra of matrices. No extensive knowedge or expertise is needed, however. 
We can define the product AB of a matrix A with m rows and n columns (we 
call it an m x n matrix) and an n x p matrix B. The product C = AB is an m x p 
matrix. If the elements of A are a (i = 1, 2, . . . , m and j = 1, 2, . . . , n), and if the 
elements of B are b jk (j = 1, 2, . . . , n ; k = 1, 2, . . . , p), then the elements of C are 
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c», given by 

n 

Cik = 2 aijbjk, i = 1, . . . , m ; k = 1 , . . . , p. 
i=i 

We do not define AB unless the number of columns of A is the same as the 
number of rows in B. The order of the factors A and B is significant. That is, BA 
may not be defined, even though AB is; and even if AB and BA are both 
defined, they need not be equal. It is useful to know that matrix multiplication is 
associative. That is, (AB)D = A(BD) when all the products shown here are 
defined. 

An m x n matrix can be used to define a function from R” to R m as follows. 
If x = (jti, . . . , x n ) is given, display x as an n x 1 matrix and multiply A times x to 
give an m x 1 matrix y as shown here: 



For each x in R" this defines y = Ax in R rt . It is easy to show (see Exercise 1) that 
the function from R n to R m defined in this way is a linear transformation. 
Conversely, if T is any element of i£(R", R m ) there is a certain mxn matrix, A, 
such that Tx = Ax for every x in R”. To prove this we use the standard basis 
ej, . . . , e„ in R", as defined in (10.12-8). Let us denote the corresponding standard 
basis in R m by ui, . . . , u m ; that is, = (1, 0, . . . , 0), . . . , u m = (0, . . . , 1). With T 
given, we know that each Tej is a linear combination of the u/s, thus: 

Te i = a n ui + a i 2 u 2 + • • * + a i m u m 

Te 2 = «2lUi + <*22112 + • • • + <*2mU m 

(11.3-1) 


Te n — <*nlU] + <*n2U2 T * ' ' T <*nmUm 


for some set of scalar coefficients, a t j. These equations show What T does to the 
standard basis elements in R rt , and we shall see that this completely determines 
T. Let x be any vector in R n , and suppose x = (*i, x 2 , . . . , x M ). Then x = 
Xiei + x 2 e 2 + • • • + Jt n e n , and since T is linear, 

n n n m 

Tx = T 2 = 2 x iT(et) = 2 2 <*ii u i 

i= 1 i = 1 /= 1 
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which means that y = (y u y 2 , . . . , y m ), where 


aiiXi + «2i*2+ • * * + a n iX n = yi 

anXi + a 22^2 + • • • + a n 2 x n = y 2 


QL\ m Xi + OL 2 mX2 T * * ’ T CX nm X n y m . 

This can be written as Ax = y, where the elements of the matrix A are defined 
by a i} = ctji. This can be expressed by saying that the m x n matrix, A, is the 
transpose of the n x m coefficient matrix of the system (11.3-1). The transpose 
of a matrix M is a matrix obtained by interchanging the rows and columns of M. 
We denote the transpose of M by M 7 . In §12.8 we shall use these facts: the 
transpose of the product of two matrices is the product of the transposes in 
reverse order; the transpose of the sum is the sum of the transposes; and the 
transpose of the transpose of a matrix is just the original matrix. The student is 
asked to verify these properties in Exercise 4. 

Observe that the zero element of i£(R", R m ) is represented by the m xn 
matrix in which all the entries are zero. 

The matrix A which represents T was constructed with the help of what 
were called the standard bases in the domain and range spaces, R n and R m . There 
are other bases and if we had used them we would have obtained other matrix 
representations of T. This suggests that A might more properly be referred to as 
the standard representation of T. But since these are the only bases which we 
shall use, we shall sometimes speak only of “the” matrix representation of a 
linear transformation. 

It is possible to summarize the results of this section up to this point by the 
following theorem, which is of basic importance, especially in the next few 
sections. 

THEOREM I. Each m xn matrix is the standard representation of a unique 
linear transformation from R n to R m , and conversely , every element of 
i£(R rt , R m ) has a unique standard representation as an m x n matrix . 

If T E ^(R", R m ) and L E j£((R m , R p ), then we can define a function L°T from 
R" to R p as follows: 


(L o T)(x) = L[T(x)] for all x e R". 

This is a composite function — the composition of L and T — and is easily proved 
to be linear (Exercise 2). Therefore, L°TG i£(R n , R p ). It is important to be able 
to express the matrix which represents L ° T in terms of the matrices represent- 
ing L and T. This can be done in the following straightforward way. Let A be 
the m x n matrix representing T and let B be the p x m representation of L. 
Then, 


(L o T)(x) = L[Tx] = B(Ax) = By, 
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where x is any vector of R” and y is its image under T in R m . 

n 

y, = 2 a.jX, (i = l,2,...,m) 

i=i 

(By)k = (2 bk>yi) (k= l,2,...,p) 

( m n \ 

(k = 1, 2, . . . , p) 

i=1 j = 1 / 

n / m \ 

= 2(2M#)xj (k = 1,2, ... ,p) 

= (jZ C kj x f ) (k = 1,2, ...,P) 

= Cx, 

where C = BA is the p x n matrix whose element in the position ( k , j) is given by 

m 

Ckj ^ bkiClij. 
i= 1 


As has been pointed out earlier, if L and T are linear transformations from a 
space to a space Y, then their sum L + T, defined by 

(L -1- T)(x) = Lx + Tx 

is also a linear transformation from °U to V. Suppose that the matrix A 
represents L and the matrix B represents T. Then it will be very easy to define 
ALB so that it will be a matrix representing L + T — we simply take each 
element of A + B to be the sum of the corresponding elements in A and B. 
Notice that this defines the sum of two matrices if and only if they have the 
same number of rows and the same number of columns. Similarly, if we define 
the product of the matrix A by the scalar a to be the matrix obtained by 
multiplying each element of A by a, then we shall have it turn out that the 
matrix aA represents the linear transformation aL. By Aa we shall mean the 
same matrix as «A. It is easy to see that the set of all m x n matrices form a 
vector space of dimension m times n. 


11.4 / SOME SPECIAL CASES 

If m = 1, then j£(R”, R m ) = i?(R n , R) is the vector space of all real-valued linear 
functions on R". The elements of this space are called linear functionals on R”. 
From Theorem I we know that each such linear functional can be represented by 
a 1 x n matrix. Such a matrix, consisting of 1 row and n columns, is just an 
ordered n -tuple of real numbers which acts on the vector x to give Lx as 
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follows: 


fxi 


L(x) = (ai, a 2 , . . . , a n ) x 2 \ = a l Xi + a 2 x 2 + • • • a n x n . (11.4-1) 


/ 


Here we have a (lxn) matrix multiplying an (n X 1) matrix to give the (1 x 1) 

n 

matrix 2 a i x which is just a real number. If we think of the lxn matrix that 

i = l 

represents L as a vector a, we see that the formula (11.4-1) can be written as 
L(x) = a • x, where we use the dot product in R" as defined in (10.12-1). 

In Chapter 12 we shall find it useful to think of vectors as special cases of 
matrices, and to evaluate dot products by matrix multiplication. This is easy as 
soon as we decide whether to consider a vector as a single column matrix or as a 
single row matrix. (We shall hereafter omit the word “single” in this context.) 
The rule is easy: when vectors are being used in matrix operations we consider 
them to be column matrices unless the transpose sign is attached, in which case 
we regard the vector as a row matrix. For example, if one wishes to think of the 
vector a in R" as a matrix, it is taken to be the (n x 1) matrix 


M 



\aj 

but the transpose, a T , is of course, (a i, a 2 , . . . , a„). Now we have the choice of 
writing L(x) either as the dot product of vectors, i.e., L(x) = a • x or as an 
ordinary matrix product, i.e., L(x) = a T x. In summary, we can rewrite (11.4-1) 
in the following more convenient forms 

L(x) = a T x = a • x. (11.4-2) 

Therefore, for each linear functional L on R”, there is a vector a in R n which 
represents L in the sense that Lx = a • x. It is trivial to show that this one-to-one 
correspondence is linear. Therefore J£(R n , R) and R n are isomorphic, i.e., al- 
gebraically the same. 

In the special case i?(R, R m ), the linear transformations are, by Theorem I, 
represented by m X 1 matrices, which are at once identified with vectors in R m . 
In other words, if T E j£(R, R m ), then there exists some a6R m such that, for 
every real number x, 



Tx = ax = 
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This says that every linear function from R to R m is defined by taking scalar 
multiples of some fixed vector in R m . This correspondence between J£(R, R m ) and 
R m is easily seen to be one-to-one and linear, giving us again an example of an 
isomorphism between two vector spaces. 

Finally, J£(R, R) and R are isomorphic, for all the linear functions from R to 
R are of the form /(x) = ax, and the correspondence mapping / into a is a linear 
transformation from J£(R, R) to R which is one to one. Our use here of the term 
“linear function” is different from its use in high school algebra, where functions 
of the form F(x)= ax + b are called linear. Such functions are usually called 
affine. They are obtained from linear functions simply by adding some constant. 
In the more general setting of functions from R" to R m an affine function f would 
be defined by 

f (x) = Tx + b 

where T E i£(R n , R m ) and b E R m . 


11.5 / NORMS 

We have remarked in §11 that it is not an essential characteristic of a vector 
to have magnitude and direction. Nevertheless, in many vector spaces it is 
possible and useful to define a magnitude, or length, for each vector. When this 
is done, we denote the length of x by ||x|| and call it the norm of x. (See the 
definition in (10.1-16) for R 3 and in (10.12-2) for R".) Just as we listed in §11 the 
algebraic rules governing the elements in a vector space, so we now list the rules 
that we impose on a norm in a vector space V. A function from V to R, with 
value ||x|| at x, is called a norm provided that the following rules (axioms) are 
satisfied: 

1. ||x|| ^ 0 for each x. 

2. ||x|| = 0 if and only if x = 0. 

3. ||cx|| = |c|||x|| for each x and each scalar c. 

4. ||x 4- y|| ^ ||x|| + ||y|| for each x and y. 

For a good many purposes it does not matter exactly how ||x|| is defined, as 
long as it has the foregoing properties (l)-(4). We have defined the norm of x in 
R n in a particular way in (10.12-2). That definition is related to the dot product in 
R". But in general a vector space can have a norm without any relation to a dot 
product. Other definitions of a norm are possible in R n , as we shall see presently. 

The property (4) is called the triangle inequality, by analogy with the same 
inequality that is valid in R 3 and R n . (See (10.12-6), where the reason for the name 
“triangle inequality” is given.) 

We can transform the inequality (4) in various ways. For instance, replace x 
by x + y and y by -y in (4). Then, because (x + y) + (-y) = x and ||-y|| = ||y||, we obtain 


IMI = II* + yll + llyll> 
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whence 


||x|H|y||S||x + y||. (11.5-1) 

If we exchange x and y here and bear in mind that y + x = x + y, we obtain 

llyll — 11*1 = ll x + y|| (n.5-2) 

Since |||x||-||y||| is either ||x||-||y|| or ||y||-||x||, depending on the sign of the 
difference, we conclude that 

|H-||y|||s||x + y||. (11.5-3) 

Because ||-y|| = ||y||, we can change x + y to x-y in each of the last three 
inequalities. 

Actually, the triangle inequality (4) is derivable from (11.5-1). See Exercise 
8. On this account we shall also refer to (11.5-1) as a triangle inequality. 
Likewise for (11.5-2) and (11.5-3). 

If ||x|| =1 we call x a unit vector. If x is any nonzero vector, a suitable 
multiple of x will be a unit vector. The right multiplier is l/||x||, as we see by 
property (3) of the norm. In any vector space with a norm, the unit sphere is 
defined to be the set of all unit vectors; therefore the equation of the unit sphere 
is ||x|| = 1. 

The norm of §10.12: 



is called the Euclidean norm in R n . We could also define other norms as follows: 

IMI = 2 

i = 1 


or 


||x|| = maximum of |xi|, |x 2 |, . . . , |x n |. 

Showing that these last two definitions satisfy the conditions (1) to (4) is left for 
the exercises. 

11.6 / METRICS 

In preparation for §11.7, where we shall discuss point sets in vector spaces, and 
continuity of functions with domains of definition and ranges of values in vector 
spaces, we shall now discuss measurements of distance in a vector space 
provided with a norm. 

In dealing with any vector space, it is common practice to use the words 
point and vector interchangeably. Thus we may speak either of the vector x or 
the point x. In any vector space provided with a norm we call ||x — y|| the distance 
between x and y. For vectors in the plane the appropriateness of this definition is 
shown in Fig. 83, which shows diagrammatically how the equal lengths of the 
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x y 



Fig. 83. 

vectors x-y and y-x is the same as the distance between the ends of the 
vectors x and y. 

Just as we abstracted the notion of the length of a vector by calling the 
length of x its norm ||x|| and listing the four conditions that the norm must satisfy, 
so we may abstract the idea of distance between pairs of points and list four 
conditions that we want to be satisfied by a distance function. A distance 
function is a function that gives us a value d(x, y) for the distance from x to y. 
We require the following four axioms to be satisfied: 

Di: d(x, y)^0 for all x and y. 

D 2 : d(x, y) = 0 if and only if x = y. 

D 3 : d(x, y) = d( y, x) for all x and y. 

D 4 : d(x, y) 4- d(y, z) ^ d(x, z). 

Axiom D 4 is called the triangle inequality because it expresses the fact that in a 
configuration of three points, which we may think of as forming a triangle, the 
distance along one leg of the triangle is never greater than the sum of the 
distances along the other two legs. 

Our definition d(x, y) = ||x — y|| of distance in a vector space with a norm 
satisfies the four conditions D r -D 4 ; D 4 is satisfied as a consequence of the 
triangle inequality (4) in §11.5. 

A distance function satisfying conditions D i - D 4 is called a metric. We have 
seen how to use a norm to define a metric. The idea of a metric need not be 
confined to vector spaces, however, for there are no references to addition of 
vectors or scalar multiples of vectors in the axioms - D 4 . In fact, there are 
metrics which do not come from norms. See Exercises 11 and 12. 

11.7 / OPEN SETS AND CONTINUITY 

If E is a set and G is a set, then the set consisting of all objects which belong to 
E or to G, or to both, is called the union of E and G, which we denote by E U G. 
The set consisting of all members of both E and G is denoted by E O G and is 
called the intersection of E and G. If p(x) denotes some proposition involving x, 
then {x : p (x)} denotes that set consisting of all x for which the proposition p(x) is 
true. For example, {x : x • a = 3} is the set of all vectors whose inner product with 
the vector a is 3. Similarly, {x E E : p(x)} is that set of vectors belonging to the set 
E for which the proposition p(x) is true. 
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For any vector space having a norm, we define the sphere of radius r 
centered at the vector a, denoted by S(a, r), as follows: 

S(a, r ) = {x : ||x — a|| = r}. 

The open ball , B(a, r), of radius r, centered at a, is given by 

B( a, r) = {x: ||x - a|| < r}. 

The corresponding closed ball, B( a, r), is the union of these two sets; that is, 

B( a, r) - B(a, r) U S(a, r) = {x : ||x - a|| ^ r}. 

Notice that in R 3 , S(a, r) is a sp herical surface and B( a, r) consists of the set of 
points lying inside this surface. B(a, r) is what might be called the solid sphere 
made up of the surface, together with th e set of points lying inside. How would 
you describe S( a, r), B( a, r), and B(a, r) in R 2 ? in R? Notice that S(a, r ) O B(a, r) 
is the empty set. 

If E is any subset of a normed vector space, we define an interior point of E 
to be a point which belongs to E and which is the center of some open ball 
contained in E . Notice that this is just a more general statement of our earlier 
definition in §5.1. An open set can still be defined as one consisting entirely of 
interior points, and a closed set is still the complement of an open set. 

By a neighborhood of a point, we simply mean an open set containing the 
point. This is a generalization of our earlier usage where the sets which we called 
neighborhoods were open intervals, open disks, or open rectangles. The really 
essential property of what we call a neighborhood of a point is that it is a set for 
which the point is an interior point. The definition which we use is the simplest 
way to get this property. 

A subset E of a normed vector space is said to be bounded in case there is 
some number M such that ||x|| < M for all x G E. 

Let {x k }fe =1 denote a sequence in a vector space T having a norm || ||. The 
superscript is not an exponent but is simply an index giving the order of the term 
in the sequence. As one would expect from the theory of sequences of real 
numbers (see §1.62), 


lim x k = a 

has the meaning that if e is any positive number, then there exists a positive 
integer N such that ||x k - a|| < e whenever N ^ k. This is also expressed by 
saying that the sequence {x k }2°=i converges to a. In terms of our recently 
introduced notation, it can be expressed by saying that if e is any positive 
number, then all but at most a finite number of the terms of the sequence lie in 
the e-ball, B(a, e), centered at a. 

The definition of continuity still makes sense in all vector spaces having 
norms. Suppose that % and 3/ are normed vector spaces and f is a function from 
some subset D of % to <&. To say that f is continuous at the point a of its domain 
D means that if e is any positive number, then there exists some positive 
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number 8 such that 

||f(a) - f(x)|| < e for all x6D such that ||x - a|| < 8. 

In geometrical terms this says that if B(f(a), e) is any open ball, centered at f(a), 
in <2/, then there is some open ball B(a, S), centered at a, in % such that f maps 
the intersection DflBfa, 5) into B(f(a), e). If a happens to be an interior point of 
D , then we don’t have to talk about the intersection of D with B( a, 8 ) — we can 
simply say that for some S, f maps B( a, 8) into B(f(a), e). Notice that the number 
8 may depend on both a and e. See Fig. 84 for a diagram which may help to 
picture the idea of continuity. The diagram illustrates, with c, Si, ei in place of a, 
8 , e, the case in which c is an interior point of D and the ball B(c, Si) is in D. 

Notice that since continuity is defined with the help of the norm, the norm is 
automatically a continuous function. In fact, to say that the norm is continuous 
at a means only that if e is any positive number, there is some positive number S 
such that |||x|| — ||a||| < e for all x such that ||x-a||<S. By (11.5-2), |||x|| - ||a||| ^ 
||x - a||, which shows that we can take S to be e. 

In §5.3 we stated that the composition of continuous functions is continuous. 
We shall prove here a form of this statement which is appropriate for our 
purposes. Here and later we use the notation g°f for the composite function 
whose value at x is (g°/)(x) = g(/(x)), it being assumed that / is defined at x and 
g is defined at /(x). We call g °f the composition of g and / (in that order). 

THEOREM II. Suppose that f is a function from some subset of to and that 
g is a function from some subset of <2/ to ££, where $?, ^/, and ££ are normed 
vector spaces. Suppose further that f is defined at a and that g is defined at 
f(a) = b. If f is continuous at a and g is continuous at b, then the com- 
position g ° f = <f> is continuous at a. 

Proof. Let g(b) = c. Since g is continuous at b, if B(c, e) is any open ball of 
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radius e > 0, centered at c, there exists some 8 > 0 such that g(y) E B( c, e) for all 
y in B(b, 6) at which g is defined. See Fig. 85. 

Since f is continuous at a, there is some p > 0 such that f(x) E B(b, 8) for all 
x E B(a, p) at which f is defined. Therefore, for all those x in B(a, p) at which <f> 
is defined, <£>(x) E B(c, e). 

Observe that the points of B(a, p) at which <f> is defined are those points x at 
which f is defined for which f(x) belongs to the domain of g. 

THEOREM III. If T is a linear transformation from R" to R m , then there exists 
some number M such that 

|| T x|| < Af||x|| forallxER". 

Proof. Let the standard matrix representation of T be the m x n matrix A as 
in §11.3. Then for each jc E IR n , 



I a u 

ai2 

• «i n \ 

/x, ^ 


M 

Tx — 

a 2 \ 

<*22 

<*2 n 

x 2 

• 1 

= 

f y 2 
• 


V^ml 

<*m2 

&mn I 

W 

' 

\yj 


y = (yi, y 2 , - • ‘ , y m ) is that vector in R m whose fth component y* is given by 

n 

y> = 2 a 'i x i- 
1-1 

By the Cauchy inequality, (see (10.12-4)) 

/ n \ 1/2 / n \ 1/2 / n \ 1/2 

|y,'|^(2 flij) (2*/) = (2 ««) INI- 

Now, let Q denote the maximum of the m numbers 

(i = 1, 2, . . . , m). 

Then |y f | < Q||x|| for all i and ||Tx|| = ||y|| = 

VyT+yfT- rr +yl =£ VmQ 2 ||x|| 2 = Vm Q||x||. 
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This proves the theorem and shows that one possible choice for M is VmQ. 

A transformation T for which there exists a number M such that ||Tx||^ 
M||x|| for all x is said to be bounded, and M is called a bound for the 
transformation. Later, we shall find it useful to have a bound M for a linear 
transformation in terms of the dominant element (or elements) of its matrix 
representation, A. Let 

K = max|aij|, 

where i G (1, 2, . . . , m) and j G (1, 2, . . . , n ). Then, 

Q ^ VnK^ = XVn. 

Therefore we have that 

||Tx||^ VwmK||x|| (11.7-1) 

for all x G R n . 

COROLLARY. If T is a linear transformation from R" to R m , then T is 
continuous. 

Proof. By the preceding theorem, T is bounded; that is, there is some 
number M such that ||Tx|| ^ M||x|| for all xGR". Now consider any a in R n and 

any positive e. Let 8 = —, so that 8M = e, and suppose that ||x - a|| < 8. Then 

|| Tx - Ta|| = ||T(x - a)|| ^ M||x - a|| <8M = e. 

Therefore T is continuous at a, by the definition of continuity given earlier in 
this section. Observe that the choice of 8 depends on e but not on a. The fact 
that 8 is independent of a means that T has a property called uniform continuity 
on all R". 

11.8 / A NORM ON ^(R n , R m ) 

It is not at all obvious how to define the magnitude of a linear transformation. 
An approach which turns out to be quite useful in various connections is based 
on what the transformation T does to the lengths of vectors. Of course, T will 
usually have different effects on different vectors — lengthening some, shortening 
others, and leaving the lengths of others unchanged. To get around this com- 
plication, we might try defining the norm of T just in terms of its effect on that 
vector which it stretches the most; that is, we might consider the ratio ||Tx||/||x|| 
and take the norm of T to be the maximum of this ratio as x ranges over the 
space, excluding, of course, the zero vector where the ratio is not defined. This 
approach can be simplified by noticing that 
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Then, because x/||x|| is a unit vector, we see that the set of values taken on by the 
ratio ||Tx||/||x|| for all nonzero vectors is exactly the same as the set of values 
taken by the function ||T(u)|| when u is restricted to lie just on the unit sphere 
(the set of all vectors having norm 1). 

Now, let T be any linear transformation from R" to R m . We want to show 

that 


||T|| = max (11.8-1) 

||x||= 1 

defines a norm on the vector space ■ j£(R", R m ). But first notice that || || carries a 
different meaning in each of its three appearances in (11.8-1). Since x = 
(xi, jc 2 , . . . , x„)E R", ||x|| involves the length function on R". ||Tx||, on the other 
hand, refers to the length function on IR m , and on the left-hand side, || || is a 
function from J£(R n , R m ) to the real numbers. It is general practice to use the 
same symbol, || ||, for all norms and leave it to the reader to infer from the 
context which norm is meant. 

Since both T and the norm are continuous functions, their composition, 
defined by ||T(x)||, is a continuous function from R" to R by Theorem II, §11.7. 
The unit sphere in R" is both closed and bounded (Exercise 17) so by Theorems 
I and II of §5.3 the above function has a maximum value on this set. This shows 
that for each T, 


max ||T(x)|| 

Ml= i 

exists and therefore (11.8-1) does define a function from j£(R", R m ) to the real 
numbers. It is obviously nonnegative, since the maximum of a nonnegative 
function is nonnegative. If T 0 is the zero element of i£(R n , R m ), then T 0 (x) = 0 for 
all x; hence, 


|| To|| = max ||T 0 (x)|| = 0. 

Mb i 

If T is not the zero element, then there is some vector v in R n such that T(v) ^ 0. 
Then u = v/||v|| is a unit vector such that 

T( “ ) - t (hi) = h T(, " !0 ' 

and so ||T (u)|| > 0. Therefore 

||T|| = max||T(x)||g||T(u)||>0. 

11 * 11=1 

We have now proved that the function || || defined by (11.8-1) satisfies 
Property (1) and (2) of §11.5. The fact that it satisfies (3) is easy to prove and will 
be left as an exercise for the reader. To prove that this function satisfies the 
fourth property required of every norm is slightly more difficult. What we must 
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prove is that, for all Tj and T 2 in j£(R", R m ), 

max||(T, + T 2 )x|| § ||T,(| + ||T 2 || (11.8-2) 

11 * 11=1 

for the left side here is \\T X 4 - T 2 ||. Now, by definition, (T ] + T 2 )x = Tjx 4 - T 2 x, and so, 
for any unit vector x, 

II (T, + T 2 )x|| = ||Tjx + T 2 x|| ^ ||T lX || + ||T 2 x|| ^ ||T,|| + ||T 2 ||, 

for certainly ||Tix|| ^ ||Ti|| by the definition of || T t ||, and likewise for T 2 . But then 
we see at once from the foregoing that (11.8-2) is true. We have now proved that 
the norm defined by (11.8-1) does indeed satisfy the triangle inequality. 

An important property of this norm is that, for all x, 

||Tx||s||t|| ||x||. (11.8-3) 

The proof of this fact (Exercise 13) is short. The reader should pause long 
enough now to distinguish clearly among the three meanings which the norm 
symbol has in (11.8-3). 

The norm which we have defined for linear transformations is useful even 
though we have no way of evaluating it except in certain special cases. It will be 
sufficient for our purposes to have a practical bound for the norm, and we have 
already obtained one in the process of proving Theorem III of §11.7. It was 
found in (11.7-1) that if T G j£(R", !R m ) and A is an m X n matrix representing T, 
then 

||T|| £ y/nmK, 


where K = max |a u |. 

l^i^m 

i=j=rc 

We get an important lower bound for ||T|| by considering what T does to the 
unit vectors making up the standard basis in R". Recall that for each r G 
{1, 2, . . . , n}, e r is that ordered n -tuple consisting of 1 in the rth place and 0’s in 
the other n — 1 places. If C r is the norm of Te r , 

/ m \ 1/2 

||Te r || = |Ae,|| = ||(a„, a 2r , a mr ) || = g aij = C r . 

C r can be thought of as the length of the vector represented by the rth column in 
A. Let C = max C r for r G {1, 2, . . . , n}. In other words, C is the length of the 
longest of the n column vectors in A. Since ||Tu|| = C for at least one unit vector, 
||T||^C. From the definition of C r we see that C r is greater than or equal to 
maXj|a ir |. Hence, C = max r C r is greater than or equal to max r (maxj |u ir |) = K, and 
so ||T||^ C>K. Combining this with our previously obtained upper bound, we 
have 

K^||T||^ VrnnK. (11.8-4) 

The space ££(R n , R) is an interesting one from the point of view of computing 
norms. Recall that the elements of this space are called linear functionals and 
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that each one of them is represented by a unique vector in R". Suppose that T is 
a linear functional on R n and that a = (ai, u 2 , . . . , a n ) is its representing vector. 
The norm a, of course, is Va? + a\+ • • • + aj. The norm T is the maximum 
value of ||Tx|| on the unit sphere, that is, on the set of vectors having unit norm 
in R". 


||T|| = max ||Tx|| = max |a • x| = max |ajXi + ■ • • + a n x n \. 

||x||= 1 ||x||=l W=1 

By the Cauchy inequality, 

|aix, + • • • + a„x„ | s ^ afj ||x|| = ||a|| • ||x||. 

So for all unit vectors, ||Tx|| < ||a||. Assuming that T is not the zero functional, a 
is not the zero vector and u = a/||a|| is a unit vector. A simple calculation shows 
that ||Tu|| = ||a||. Therefore, 

max ||Tx|| = ||a||, and hence, ||T|| = ||a||. 

||x||=l. 

This says that the norm of each nonzero linear functional on R" is the same as 
the R n norm of its representing vector. Since this statement about equality of 
norms is obviously also true for the zero functional, it is true in general. 

11.9 /^(R n ) 

In the important special case where m and n are same, S£(R n , R") is abbreviated 
to j£(R”), and it denotes the vector space of all linear transformations (in this 
case operators) from R n to R n . (Recall that in this book we use the term linear 
operator to denote a linear transformation from a vector space to itself.) From 
Theorem I we know that the operators in i£(R n ) are represented by n X n 
matrices. An important subset of j£(R") consists of the invertible operators. 
These are one-to-one functions and are also described as nonsingular. Saying 
that T is an invertible operator means that for each y in R n , there is a unique x in 
R” such that Tx = y. The unique x associated with each y in this way is denoted 
by T~ 1 y. This function T _1 is called the inverse of T. The linearity of T implies 
that of T 1 (Exercise 3); therefore, if T is an invertible member of i£(R n ), T -1 
also belongs to ^(R"). 

One way of telling whether a linear operator is invertible is to look at the 
equation Tx = y in matrix form. If A is a matrix representing T, we have Ax = y, 
or in scalar form, 


’<* 11*1 + a 12 X 2 + ■ * 

• + a\ n x n \ 


l y '\ 

a 2 i*i + a 2 2*2+ ’ ' 

• + a 2n x n 


1 y 2 1 

'a n iXi + a n2 x 2 + ■ 

F O nn X n 1 


\J 
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We shall assume that the reader has had enough experience with simultaneous 
linear systems like this to know that: For each y in R n , a necessary and sufficient 
condition that there exist a unique solution x is that the determinant of the 
coefficient matrix be different from zero. Such matrices are called nonsingular or 
invertible. So the invertible linear operators are those which are represented by 
invertible (or nonsingular) matrices. 

Another important characterization of invertible operators is obtained by 
applying the italicized proposition of the preceding paragraph to the special case 
where all the y’s in the above system are zero. In this case, the linear system is 
said to be homogeneous. It is obvious at a glance that a homogeneous system has 
the trivial solution *i = x 2 = * ■ • = x n = 0. If the determinant is different from 
zero, this trivial solution must be the only one — in other words, a nonsingular 
operator T maps only the zero vector into the zero vector. But if the determin- 
nant is zero, that is, if the operator is singular, then the trivial solution is not 
unique and this means that T maps some nonzero vector into the zero vector. 
Hence , a necessary and sufficient condition that a linear operator be invertible is 
that it map only the zero vector into the zero vector. It therefore follows that if a 
linear operator maps only the zero vector into the zero vector, then it must be 
invertible, and this is the property which we shall use in §11.10 to prove that 
certain elements of j£(IR n ) are invertible. 

Suppose that T is invertible. We have defined T _1 y to be the unique solution 
of the equation Tx = y. Since there is a unique solution for all y, we have 

T(T 1 y) = y for all y. 

This means that the composition T ° T" 1 = I, where I is the identity trans- 
formation (i.e., I maps each vector into itself). 

As is the case with all invertible functions, T 1 is also invertible, and in fact, 
(T -1 ) -1 = T. This means that for all y, the equation T~ l x = y has the unique 
solution Ty (i.e., T _1 (Ty) = y for all y). Another way to say this is that 
T -1 °T = I. This and the preceding paragraph together give us that 

T o T l = T~ l °T = I. 

In other words, every invertible linear operator commutes with its inverse and 
their composition is the identity operator. Conversely, it is easy to show 
(Exercise 19) that if two linear operators commute and their composition is the 
identity operator, then each is the inverse of the other. From this and the fact 
that composition is associative (Exercise 17), it is easy to prove (Exercise 18) 
that if T and L are invertible then their composition is invertible, and that 

(T o L) -1 = IT 1 ° T \ 

That is, the inverse of the composition of two invertible operators is the 
composition of their inverses in the opposite order. 

Suppose that A and A -1 are the matrix representations of T and T~\ 
respectively. Now, matrix multiplication was defined in such a way as to make 
the product of two matrices represent the composition, in the same order, of the 
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operators which they represent; it follows, therefore, that 

AA 1 = A _1 A = I. 

Here we have used the same symbol, I, to represent both the identity operator 
and the so-called identity matrix which represents it. Thus, every nonsingular 
matrix commutes with its inverse, and the product is the identity matrix. From 
the fact that 


lx = x for all x, 

it is easily seen that I is the n x n matrix with l’s along the main diagonal and 0’s 
elsewhere. 

If L and T are any two members of j£(IR"), then L ° T G j£(R n ) and by 
(11.8-3) 

I IICX. ° T)(x)|| = ||L(T(x))|| S ||L|| • ||T(x)|| for all x. 

Using (11.8-3) again, we get 

||(L • T)(x)|| S ||L||||T|]||x|| for all x G R". 

By the way we defined the norm of a linear transformation, this says that 

||LoT||.S||L||||T||. (11.9-1) 

In words, the norm of the composition of two linear operators is less than or 
equal to the product of their norms. Suppose now that T happens to be 
invertible. Then T ° T" 1 = I, and since it is obvious that ||I|| = 1, (11.9-1) gives 

||T||||T- 1 ||^1, (11.9-2) 

or, the norm of the inverse of a linear operator is greater than or equal to the 
reciprocal of the norm of the operator. 

Since the inverse T 1 of an invertible linear operator T is also linear 
(Exercise 3), it too has a matrix representation. If A is the matrix which 
represents T, then the matrix representation of T 1 is denoted by A -1 . There are 
ways of constructing A -1 when A is given, but we leave most of this to a course 
in matrix algebra. However, we shall have occasion later to invert a 2 x 2 matrix. 
This is easily accomplished by the following three-step process: 

(i) Exchange positions of the two main diagonal elements. 

(ii) Change the signs of the two off-diagonal elements. 

(iii) Now divide each of the four elements by the determinant. 

In other words, if 

A= (c d\ thenA_, = 
where A = ad - be. Remember that, by Cramer’s rule, A ^ 0 if and only if A is 
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invertible. It is easily verified that the displayed matrices are indeed mutually 
inverse. 

11-10 / THE SET OF INVERTIBLE OPERATORS 

Let us denote by ft the set of invertible operators belonging to the vector space 
«Sf(R"). In Chapter 12 we shall use the important fact that ft is an open set in 
J£((R n ). This means that if T E ft and L is an operator such that ||T-L|| is 
sufficiently small, then LGO. To prove this we first prove some preliminary 
results. 

LEMMA. If T E J£(R n ) and ||T|| < 1, then I — T is invertible , i.e. (I - T)E ft, and 

W-Trllsiqfff 

Proof. Consider any x^O. We shall show that (I-T)x^ 0, which implies 
that I - T is invertible. Now || (I - T)x|| = \\x - Tx\\ ^ |||x|| - ||Tx|||, by (1 1.5-3). But 
||Tx||^||T|| ||x||. Therefore 

||(J-T)x||^||x||(l-||T||)>0 (11.10-1) 

because ||T|| < 1 and ||x|| > 0. Therefore (I - T) 1 exists. 

To estimate the norm of (I - T) 1 we can substitute for x in (11.10-1) the 
vector (I - T) 1 y, where y is an arbitrary vector in R". On the left we get 
||(J-T)(I-Tr , y||=||Iy|| = ||y||. 

Therefore, ||y|| ^ ||(J - T)~ l y ||(1 - ||T||), or 

IKI-Tr'ylls-J^L for all y £ R". 

This gives the inequality for || (I - T) _1 || stated in the lemma. 

COROLLARY. If ||I - T\\ < 1, then T is invertible . 

We deduce the corollary as an application of the lemma by putting I - T in 
place of T in the lemma, observing that I - (I — T) = T. Note that the corollary 
states that the entire open ball centered at J, with radius 1, lies in ft. 

THEOREM IV. The set of invertible operators is an open set in ^(R n ). In fact , if 
T Eft and ||T - L|| < 1/||T _1 ||, then LEft. Moreover , 

l|L 'II = j _ \\T~'(T — L)||' (11.10-1) 

Proof. We use the fact that T is invertible to write L=T-(T-L) = 
T[I - T l (T - L)]. By (11.9-1) and our hypothesis, 

||T -1 (T — L)|| g ||T _t ||||T — L|| < 1. 
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Using the lemma (with T _1 (T-L) in place of T) we conclude that I — 
T~\T - L) is invertible. Since L is the composition of the two operators T and 
I - T~\T - L ), both of which are now known to be invertible, we have 

L _1 = [I - T _1 (T — 

By 11.9-1 and an application of the inequality in the lemma, we obtain (11.10-1). 


Notice that Theorem IV tells us that each invertible operator T in j£(R") is the 
center of an open ball whose radius is the reciprocal of the norm of the inverse 
of T, and each of whose points is itself an invertible operator. 

THEOREM V. The function from ft to ft which associates with T E ft its 
inverse, T~ l , is continuous. 


Proof. We have to show that, if T E ft we can make || L 1 - T *|| as small as 
we like by requiring \\L - T|| to be suitably small. Theorem IV gives us nearly all 


we need. To begin with, suppose || L — T|| < 
Observe that 

T\T-L)L l 


IIT ’|| 

(I - T 'L)L ' = L~' 


nr. Then we know that L 1 exists. 


'-1 


and therefore, by a couple of applications of (11.9-1), 
||L-'-T->||s||T-'||||T-L||||L-‘||. 

Then, by (11.10-1), 

||L-‘ - T-’|| S IIT-’HIIT - L|| t 

But ||T -1 (T - L)\\ ^ ||T -1 ||||T - L||. Let us bring L still closer to T by requiring 
that ||L - T || < 2 ||^~i|| . Then we see from the foregoing that 1 - ||T _1 (T - L)|| > 1/2 
and therefore 


||L- l -T- l ||^2||T-f||T-L||. 

From this we immediately draw the desired conclusion. 

We conclude by asking: Is ft a vector space within i£(R")7 Is the cor- 
respondence between T and T 1 linear? 


EXERCISES 

1. Let A be an m x n matrix. Show that the function f:R"^[R m defined in §11.3 by 
f(x) - Ax is a linear transformation. 

2. Given that T G ^(FT, R m ) and L E ( R m , IR P ), prove directly from the definition of 
linearity that L° T E J£(R n , R p ). 

3. Prove directly from the definition of a linear transformation that the inverse of 
an invertible linear operator is a linear operator. 
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4. (a) Show that the transpose of the sum of two matrices is equal to the sum of 
their transposes. 

(b) If the vector x in R m is thought of as an (m x 1) matrix, then its transpose x T is a 
(1 x m) matrix. Show that for any (n x m) matrix A and any x£R m , 

(Ax) t = x t A t . 

Then generalize this to show that 

(AB) t = B t A t 

in all cases where A and B are matrices such that their product, AB, is defined. 

(c) If A is an (n X m) matrix, x G R m and y G R", then 

(y T A)x = y T (Ax), 

proving that matrix multiplication is associative in this very special case. 

(d) Show that for every matrix A, 

(A t ) t = A. 

5. Show that (11.5-1) and property (3) of a norm in §11.5 imply property (4) of that 
norm by making suitable replacements for x and y in (11.5-1). 

6. For each xGR", let 

( n \ 1/P 

,?,w p ) . 

Show that for each p ^ 1, || || p is a norm on R". Hint: Use Minkowski’s inequality 

(Exercise 32, §6.8). 

7. Let || ||i and || || m , with the subscript m standing for “max,” be two functions 
from R” to R defined as follows: 

IHI. = 2 W 

M™ = max |x ( |. 

ISiSn 

Show that || ||i and || || m are norms on R". For the special case where n = 2, draw the unit 
sphere (circle) in these two norms on the same co-ordinate axes with the unit sphere in the 
Euclidean norm. 

8. Prove that if / and g are continuous, real-valued functions on [a, b], then 
|J /(x)g(x) dx | £ (£ f(x)dxj (£ g 2 (x)dxJ . 

What condition is both necessary and sufficient for equality? Hint: Since 
f [A/(x) + g(x)] 2 dxg0 for all A, 

J a 

the quadratic equation in A, 

A 2 \ f\x) dx + 2A f f(x)g(x) dx+ [ g 2 (x) dx = 0 

Ja Ja Ja 

cannot have two distinct real roots, and therefore the discriminant cannot be positive. 
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This inequality, which is reminiscent of the Cauchy inequality, is known as the 
Schwarz inequality. Notice that the Cauchy inequality can be proved by the method used 

here, starting from the fact that (Aa { + bi) 2 = 0 can be written as a quadratic equation 

in A which obviously cannot have distinct real roots. The similarity between these two 
inequalities has led many authors to lump them together under the same name — the 
Cauchy-Schwarz inequality. 

We have assumed that /, g, f 2 , g 2 , fg , and (A/ + g) 2 are all integrable. These things are 
proved in Chapter 18. 

9. Let % denote the vector space of continuous functions on [0, 1] (see Example 2, 
§11) and let || ||i and || || 2 be functions from % to R defined as follows: 


||/||, = max |/(x)|, 

OSxSl 

11/1,2 = [/«' /2(X) dJC ] /2 * 


Prove that || ||i and || || 2 are norms on %. Hint: For || || 2 use Exercise 8. 

10. Prove that if || || is any norm on a vector space, T, then the function d defined by 


d(x, y) = ||x — y|| for all x and y in T 

is a metric. 

11. Prove that if T is any vector space and d is the function defined by 

d(x, y) = 0 if x = y, and 
d(x, y) = 1 otherwise, 

then d is a metric. Prove that there cannot exist any norm, || ||, on V such that this particular 
metric is given by ||x - y|| for all x and y in T. 

12. Let p denote the usual distance function in the plane, R 2 , and define 


d(x, y) = 


p(x,y) 

l + p(x,y)' 


Show that d is also a metric on the plane. Show that it is not possible to define a norm on 
the plane such that d is expressible in terms of this norm as in Exercise 10. 

13. Prove (11.8-3). 

14. Let S(0, 1) denote the Euclidean unit sphere in R", that is 


S(0, l) = {xe[R":2 xf= 1}. 

i= i 

Prove that S(0, 1) is closed. 

15. Let {x k }k=i denote a sequence in R rt , that is, x k = (xf,x 2 ,. . . , x k ). Prove that a 
necessary and sufficient condition that 


limx k = y = (yi,y 2 , ...,y n ) 
is that lim k ^oo x? = for i = 1, 2, . . . , n. 

16. If x and y are any two points in R", then the equation of the straight line 
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determined by them can be written 

z = x+ f(y-x). 

For n = 2 you may have thought of this as the parametric representation of the line, t 
being the parameter. In general, the right-hand side is a function from R to R n . Notice that 
the function maps the interval 0 ^ t ^ 1 into that linear segment between x and y 
inclusive. This means that we can define the line segment determined by x and y to be the set 
of points of the form (1 - t )x + ty for t £ [0, 1]. Equivalently, we can define this segment to be 
the set of points 

Aix + A 2 y, 

where Ai^O, A 2 ^0 and Ai + A 2 = 1. To say that a subset E of a vector space is convex 
means that if x and y belong to E, then the line segment determined by x and y also lies in 
E. Prove that every ball — open or closed — is convex. 

17. Suppose that g:B -» C; and h:C-*D. Since composition of functions 

is a binary operation, we get a function from A to D by forming either h ° (g ° /) or 
( h °g)°f. Show that these two functions are the same. In other words show that although 
composition of functions is not necessarily commutative, it is always associative. 

18. Using Exercise 17 show that if L and T are invertible linear operators, then L°T 
is invertible, and (L ° T)' 1 = T _1 ° L~\ That is, the inverse of the composition of two 
invertible operators is the composition of their inverses in the opposite order. 

19. Show that if T and L are two members of j£(R") such that T °L = L°T = J, 
then they must be invertible and each is the inverse of the other. 

20. Let A and B denote linear operators on R n . Prove that if B is invertible and A 
commutes with B, then A commutes with B~\ 

21. Show that if T E i£(R n ) and ||T|| < 1, then 

(I - ry l = I + T + T 2 +--- + T k + T k+i (I - T) 1 
for every positive integer k. 

22. Show that the set of invertible operators is not a bounded subset of J£(R"). 

23. Show that the function f(T ) = T~\ defined for T GQ (as defined in §11.10), is 
not uniformly continuous. That is, show that in choosing 8 so that \\T — T 0 ||<6 implies 
\\T~ l - To ’ll < €, where T 0 and e are preassigned, 8 cannot be chosen independently of To. 
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12 / INTRODUCTION 

The purpose of this chapter is to unify much of our previous work by developing 
differential calculus in the general setting of functions from R" to R m . This will 
include the subjects of transformations introduced in Chapter 9, real-valued 
functions of several variables (i.e., functions from R n to R), vector-valued 
functions of a single variable (i.e., functions from R to R"), and even the 
one-variable functions of elementary calculus as special cases. A function from 
R n to R m is an ordered m -tuple of real-valued functions of n real variables, for 
example, 

yi =/ <1) (x,,x 2 , . . . , x„) 

y 2 = / <2) (x,, x 2 , . . . , x„) (12-1) 

y m = / <ra, (x„ x 2 , . . . , x„). 

Here we have used superscripts to distinguish the component functions (or 
co-ordinate functions , as they are sometimes called), so that we can continue to 
use subscripts to denote partial derivatives. The vector (x u x 2 , . . . , x n ) of R" is 
mapped into the vector (y 1? y 2 , . . . , y m ) of R m . If we denote the first of these by x 
and the second by y, and if we let f denote the ordered m -tuple of functions 
(f (1 \f (2) , • • - ,/ (m) ), then we can abbreviate the notation of our function (12-1) to 

y = f(x). 

Such functions sometimes go by the name of transformations, as in Chapter 9. 

The special case where all of the / (l) ’s are linear has already been studied, in 
Chapter 11. In that case, (12-1) reduces to 

yi = a u x 1 -ha 12 x 2 + • * * + a Jn x n 

yi = a 2t xi + a 22 x 2 + • • • + a 2n x n (12-2) 

y m = a m i*i + a m2 x 2 + • ■ • + a m „x n . 

This is frequently reduced to the matrix equation 


y = Ax, 
335 
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where A is the m x n coefficient matrix. It is important to think of (12-2) as a 
special case of (12-1), and to see that this special case consists of all the linear 
transformations from R" to R m . This is the vector space which we have already 
studied under the name i£(R", R m ). 

The relevance of linear algebra to differential calculus consists in the fact 
that it is possible to obtain very good local approximations to quite general 
functions, such as (12-1), by using linear functions, such as (12-2). We shall see 
that this enables us to deduce important information about the behavior of a 
nonlinear function near a point by studying the linear (or affine) functions of best 
approximation at that point. We begin by extending the idea of a differential to 
our more general setting. 


12.1 / THE DIFFERENTIAL AND THE DERIVATIVE 

Our first objective here is to extend in a suitable way for functions from R" to 
R m the definition given in §6.4 of differentiability and the differential for 
functions from R 2 or R" to R [see (6.4-4) and (6.4-17)]. Our second objective is 
to extend in a suitable way for functions from R" to R m the relationship between 
differentials and derivatives that exists in the case of a function from R to R, as 
set forth in §1.3. Then we shall show that a differentiable function from R" to R m 
is continuous, and we shall state and prove the general chain rule for differenti- 
able functions. 

In elementary calculus it has long been customary to introduce the deriva- 
tive first, and then the differential. For functions from R n to R m , where n > 1, it 
is natural to begin with the differential and come to the derivative afterward. In 
fact, the concept of the derivative when n > 1 is more sophisticated than the 
concept of the derivative in §1.3; it requires us to think of the derivative as a 
function from R" to i£(R", R m ). But when n = m = 1 the more sophisticated point 
of view is in full harmony with the elementary point of view in §1.3. 

In §6.4, and later in §7, we discussed the notion of the differential of a real 
function of several real variables. The differential of a function / from R n to R is 
a function of (xj, . . . , x n ) and (dx i, . . . , dx n ) whose value is 

^ dJC ’ + " ■ + i dXn ’ (12 - 1_1) 

where the partial derivatives are evaluated at (xi, . . . , x„). Thus the differential is 
a linear function of (dx u . . . , dx n ) when we keep (x 1? . . . , x„) fixed. But the 
definition of the differential requires more of f than merely that it have first 
partial derivatives with respect to each of the variables xj, . . . , x„. If we use the 
vector notation 

x (xj, . . . , x n ), h (hi, ■ • • , h n ), 

the function f from R" to R, defined in a neighborhood of x, is said to be 
differentiable at x if there exist numbers Ai, . . . , A„, depending on f and x, such 
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that 


n 


lim 

INbo 


l/(* + h) — / (x) - 2 -Aihi | 

M ~ 


= o. 


( 12 . 1 - 2 ) 


It is then necessarily the case that / has first partial derivatives at x given by 


3 / 

dXi 


A, 


(12.1-3) 


When / is differentiable, its differential, which is the linear function of h with 
value 2" =1 Ajhj, can be written in the form (12.1-1) if we put = dx t . What we 
have given here is merely a restatement of what was given in connection with 
(6.4-17). 

A student who wishes to review the discussion of differentials earlier in this 
book will find it appropriate to look at the early part of §1.3, parts of §6.4, and 
§7. The crucial portions are at (1.3-1), (6.4-4), (6.4-17), and (7-2). 

For the general case of a function f from a neighborhood of some point x in 
R" to R m the definition of differentiability and the differential is modeled very 
closely after the case in which m = 1. The guiding idea is to seek a linear 
function of h from R" to R m to take the place of 2[* =1 Ah in (12.1-2), and to use 
the norm instead of the absolute value used in the numerator of (12.1-2). The 
essential thing is to regard the linear function of h as an approximation to the 
difference f(x + h) — f(x), and to express the sense in which the approximation is 
better and better as ||h||-*0. Now, a linear function from R" to R m is simply a 
member of j£(R", R m ). So, for a given x, we are seeking a T G j£(R", R m ), 
depending on x, of such a nature that Th is a good approximation to f(x + h) — 
f(x) when ||h|| is small. The requirement we impose is that the difference 

f(x + h) — f(x) - Th 

be small in comparison with h when h is small. The precise formulation of the 
requirement is that 


Who M 


(12.1-4) 


When this condition is satisfied for the given x we say that f is differentiable at x, 
and that the differential of f is the function of x and h whose value is Th. The 
differential is defined for each x at which the differentiability requirement is 
satisfied and for every h in R”. The value of the differential is the vector Th in 
R m . We denote the differential as a function by dt and its value by df(x, h). 

It is important to know that, when (12.1-4) is satisfied by a certain T in 
i?(R", R m ), that T is unique. Such is indeed the case. The proof is left as an 
exercise (Exercise 1). In a later section we shall see in a different way that T is 
unique, by finding its standard matrix representation. The elements of the matrix 
that represents T are first partial derivatives of the component functions of f, as 
given in (12-1). 
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As indicated earlier, T may be expected to depend on x. To indicate this 
dependency we write T x . The use of x as a subscript is to be distinguished 
carefully from a notation such as Tx or T(x), which would indicate that T is 
acting on x, transforming it into a vector in R m . 

When f is differentiable at one or more points there is an associated function 
from R" to <2?(R n , R m ). Its domain is the set of those x’s at which f is differenti- 
able, and the value of the function at x is T x . We shall call this function the 
derivative of f and denote it by f'. Thus, by definition, 

f'(x)=T x . (12.1-5) 

With this definition we can say that f has a derivative f'(x) at x if and only if f'(x) 
is an element of j£(R n , R m ) such that 

lim !| f( x + - j ~i i y ~ f ' (x — 11 = 0. (12.1-6) 

IWh® ||n|| 

We must remember that the notation f(x)h signifies that f(x) is a linear trans- 
formation that maps h into the element f'(x)h of R m . 

The relation between the differential dt and the derivative f' is expressed in 
the equation 

df(x, h) = f'(x)h. 

If we write dx = (dx u . . . , dx n ) in place of h = (hi, . . . , h n ), we .can write the 
foregoing in the form 

df(x, dx) = f'(x) dx. (12. 1-7) 

Here dx denotes an arbitrary element of R”. We are immediately reminded by 
(12.1-7) of the familiar expression /'(*) dx for the differential of a real function 
of a real variable, in which f f (x ) is a real number. Now, when m = n = 1, a 
member of J£(R n , R m ) is represented by a 1 x 1 matrix consisting of a single real 
number. So, if we think of f f (x) dx in (12.1-7) as the product of an m x n matrix 
times an n x 1 matrix, in the case when m = n = 1 we have merely the product of 
two real numbers. Thus the presentation of the differential in §1.3 is really just a 
special case of the general presentation we have here. 

We come now to the first important theorem about differentiable functions. 

THEOREM I. If f is differentiable at a, it is continuous at a. 

We leave the proof as an exercise (Exercise 2). 

Another important theorem about differentiable functions is the theorem 
known as the chain rule. It asserts that the composition of two differentiable 
functions is differentiable, and it provides the rule for finding the derivative of 
the composite function. Here is the precise statement. 

THEOREM II. (THE CHAIN RULE). Consider two functions f and g and the 
composite function g°f in the following situation : f maps an open subset U of 
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R" into R m , with x 0 in U and y 0 = f(x 0 ); g is defined on an open set V in R m 
that contains f(x) for all x in 17, and g maps V into R p . Assume that i is 
differentiable at x 0 and that g is differentiable at y 0 . Then g ° f, which maps U 
into R p , is differentiable at x 0 , and 

(g o f)'(xo) = g'(yo) ° f'(xo). (12. 1-8) 

Proof. For notational convenience, we shall represent f' (x 0 ) by T and g'(yo) by 
L. Then T G i?(R\ R m ), L G ^(R m , R p ), and g'(yo) ° f'(xo) = LT G ^(R n , R p ). What 
we wish to prove is that g ° f has a derivative at xo given by (g ° f)'(x 0 ) = LT ; that is, 
that 


lim ^ g[f(x " + ^ — ^ = 0. (12.1-9) 

IWH> ||n|l 

We can prove this by straightforward calculation, using the differentiability 
condition for f in the form (12. 1 — 4) with x — x 0 and a corresponding condition for 
g at y 0 . Let us assume h ^ 0 and define 

f(xo + h)-f(xo)- Th. 

F(h) " M ’ 

we assume that ||h|| is small enough to keep x 0 + h in U. The differentiability of f 
at x 0 means that 


lim||F(h)|| = 0. 

Who 

If k ^ 0 and if ||k|| is small enough to keep y 0 + k in V we define 


G(k) 


g(y 0 + k) — g(y 0 ) — Lk 


The differentiability of g at y 0 means that 


lim ||G(k)|| = 0. 

IWho 

Finally, if h ^ 0 we define 

- g[f(x Q + h)] - g[f(x 0 )] - LTh 

H(h) - N 

Then (12.1-9) can be rewritten as 


( 12 . 1 - 10 ) 


( 12 . 1 - 11 ) 


lim ||H(h)|| = 0. ' (12.1-12) 

1WH> 

Our task is to convert the expression for H(h) into a form that allows us to 
get an estimate of the size of ||H(h)|| and show that (12.1-12) is true. Now, h is 
our independent variable; we are going to make k depend on h, setting 

k = f(x 0 + h)-f(x 0 ). 


(12.1-13) 
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Here we cannot be sure that k ^ 0, so we complete the definition of G, setting 
G(0) = 0. Because y 0 = f(x 0 ) we can write f(x 0 + h) = y 0 + k, and so 

_ g(yo + k) - g(yo) - LTh 

H<l,, Si 

From the definition of G we see that 


and so 


g(y 0 + k) - g(y 0 ) = ||k||G(k) + Lk, 
J|k||G(k)+Lk-LTh 

H(h) in 


(12.1-14) 


Now, from (12.1-13) and the definition of F we see that 

k = ||h||F(h)+Th, 

and therefore 

Lk = ||h||L[F(h)] + LT(h). 
Substituting this result into (12.1-14) we obtain 



From (12.1-15) we see that 


(12.1-15) 


(12.1-16) 


l|k|| = INI ||F(h)|| + ||T||||h||, (12.1-17) 

from which we conclude that ||k||^>0 when ||h||->0. Now we put the result of 
(12.1-17) into the numerator on the right in (12.1-16) and simplify by canceling 
out ||h|| from the numerator and denominator. The result is 

l|H(h)|| ^ (||F(h)|| + ||T||)||G(k)|| + ||L||||F(h)||. 

From this, using (12.1-10) and (12.1-11), we see that (12.1-12) is true. This is 
what we wished to show, and so Theorem II is proved. 


The general form of the chain rule in Theorem II contains as special cases 
the elementary chain rule of elementary calculus, given as Theorem II in §1.11 
and discussed in §1.3, as well as the chain rule of Theorem V in §7.3. Here we 
see how the use of vector space ideas and co-ordinate-free notation in multivari- 
able calculus enables us to unify some previous results and express the chain 
rule almost as simply in the general case as in the elementary case of Chapter 1. 


12.2 / THE COMPONENT FUNCTIONS AND DIFFERENTIABILITY 

When a function f from R" to R m is differentiable, we naturally wonder what this 
implies about the differentiability of the component functions / (1) , / (2) , . . . , / (m) , 
where / (0 (x) is the ith component of the vector f(x), so that 

f(x) = (/ (1) (x),--- ,/ (m) (x)). 


(12.2-1) 
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Also, if f is differentiable, how is the derivative of f represented by a matrix (in 
the standard matrix representation of linear transformations discussed in §11.3)? 
These questions are easily answered in the following theorem. 

THEOREM III. Let l be a function from an open set U in R", mapping U into 
R m . Let i have components f (1 \ . . . ,/ (m) from U into R, as in (12.2-1). Then f 
is differentiable at a point a of U if and only if each of the component 
functions is differentiable at a. When f is differentiable at a, the standard 
matrix representation of f' (a), as a member of S6(R n , R m ), is the Jacobian 
matrix 


'df a > 



dx, 

dx 2 

dx„ | 

df^ 

df^ 

df a> 

dX] 

dx 2 

dx n 

a/ <m) 

df (m) 

df ,m> J 

k dx, 

8X2 

dx n / 


( 12 . 2 - 2 ) 


in which the partial derivatives are evaluated at a. 

Proof. Suppose that f is differentiable at a. Then, by definition, 

IMbo ||h|| 


(12.2-3) 


Let the standard matrix representation of f'(a) have A i; - as its element in the ith 
row and jth column, so that, if h = (hi, . . . , h n ), we have 

f'(a)h = (2 Atjhj, 2 A 2 ,h h . . . , 2 A mJ /A 

\j=i i=i ;=i / 

Now, the ith component of the vector 

f (a + h) - f(a) - f (a)h 
is 

/ <0 (a + h) — / <0 (a) - 2 Ai/fy. 

j = 1 

By the definition of the Euclidean norm in R”, the absolute value of the ith 
component of a vector in R” is certainly no larger than the norm of that vector. 
Therefore it is evident, as a consequence of (12.2-3), that 


lim jiTTi ^ = 0. 


(12.2-4) 


IWho ll h ll 

By (12.1-2) and (12.1-3) this means precisely that / (i) is differentiable at a and 
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that 

df (n 

Aii = djq’ (jZ2 ~ 5) 

the derivatives being evaluated at a. 

Conversely, let us start with the assumption that each component function is 
differentiable. This implies that the first partial derivatives all exist and that 
(12.2-4) is satisfied for each i, where the numbers Ay are given by (12.2-5). It 
follows from this that (12.2-3) is satisfied, because the Euclidean norm of a 
vector is no larger than the sum of the absolute values of its components. [See 
the first inequality in (7-5).] This completes the proof. 


As we know from examples in Chapter 7, the mere existence of the first 
partial derivatives of a component function / (,) at the point x = a is not sufficient 
to make / (,) differentiable at a. However, as was stated in Theorem II in §7.1, if a 
function from R" to R, defined in a neighborhood of a, has first partial 
derivatives, not just at a, but at each point in the neighborhood, and if these 
derivatives are continuous at a, then the function is differentiable at a. (A proof 
of this proposition, for the special case n = 3, is asked for in Exercise 3.) As a 
consequence, we can be assured that the function f in Theorem III is differenti- 
able at a if all the partial derivatives in the Jacobian matrix (12.2-2) exist 
throughout a neighborhood of a and are continuous at a. This is a useful way of 
testing for differentiability in practice. 

There are two particular cases that deserve special mention. One is the case 
in which n = 1 ; the other is the case in which m = 1 . 

When n = 1, the vector x becomes a real variable x and the component 
functions are real functions of a real variable. In this case the Jacobian matrix 
(12.2-2) has just one column. Its elements are the ordinary (not partial) deriva- 
tives with respect to x of the component functions. If the derivatives at x = a of 
the component functions are A u ... , A m , the derivative f'(a) maps the scalar h 
into the vector (A x h, ... , A m h) and we can regard f(u) as the vector 
(A i , . . . , A m ). 

When m = 1 we have a scalar function / of the vector x. In this case the 
Jacobian matrix representing /'(a) has one row and n columns, and can be 
regarded as a vector (fi(a), . . . , /„(a)), where 


The differential is 


fj(a) = evaluated at x 

C' JCj 


a. 


df( a, h) = /'(a)h = X fi(»)b (12.2-6) 

i=i 

In §10.6 we defined the gradient of a scalar function defined on an open set 
in R 3 . It is natural to extend that definition by defining the gradient of a scalar 
function / from R" to R as the vector function grad f from R n to R" with 
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components given by the partial derivatives: 

8rad/(x)= (lr ■••£)• < 12 - 2 - 7 ) 

when / is differentiable at x; in (12.2-7) the partial derivatives are evaluated at x. If 
we view the grad /(x) as a one-rowed matrix, we see that it represents f\x). When 
/'(x) is applied to a vector x the resulting scalar can be viewed as a dot product: 

f(x)h=(grad/(x))h. (12.2-8) 

As a matter of notational convenience we shall denote the value of grad f(x) 
when x = a as grad /(a). 

In the next section we shall discuss an application of the gradient to the 
problem of finding a point where a scalar function attains a maximum or 
minimum value. 

12.21 / DIRECTIONAL DERIVATIVES 
AND THE METHOD OF STEEPEST DESCENT 

Just as we say that a nonzero vector determines a direction (the direction of the 
arrow that represents the vector) (in R 3 ), so we shall say that a nonzero vector in 
R" determines a direction in R". Now consider a scalar function /, defined and 
continuous on some open set in R n . Because / is continuous, the change in the 
value of f(\) as we move in any given direction from a particular point will be 
gradual, and will be small if the change in distance is small. If u is a unit vector 
and a is a point in the domain of /, we define 

DJ( a) = + (12.21-1) 

t^0+ t 

provided the limit exists, as the directional derivative of / at a in the direction of 
u. In taking the limit in (12.21-1) it is assumed that t is positive and small enough 
to assure that a + tu is always in the domain of /. The directional derivative is to 
be interpreted as the rate of change per unit of distance, at a, of the value of /, in 
the direction of u. This is in accord with the standard interpretation of the 
derivative in elementary calculus, because the distance between a+tu and a is 

||(a+tu) — a|| = ||tu||= t||w||= t. 

In the case in which / has a first partial derivative with respect to Xj at a, it is 
readily seen that this partial derivative is equal to the directional derivative at a 
in the direction of the standard basis vector e,, for in that case the only 
difference between a + tej and a is the jth co-ordinates, the jth co-ordinate of a + tej 
being a ; -I - 1 and that of a being a } - [where a = (a lf . . . , a„)]. 

There is an important relationship between directional derivatives and the 
gradient. For R 3 this was mentioned in §10.6. The situation in R" is stated in the 
next theorem. 
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THEOREM IV. Suppose that f, from R" to R, is differentiable at a. Then , for 
any direction u ( where u is a unit vector), 

DJ( a) = [grad /(a)] • u (12.21-2) 

Moreover, when grad /(a) is not zero, its direction is that in which f(\) 
increases most rapidly at a, and this greatest rate of increase is equal to 
llgrad / (a)||. 


Proof. We can obtain (12.21-2) directly from the definition of different 
tiability of / by applying (12.2-3) to this case with h = tu and t >0. Then ||h||->0 
means t-»0+. Therefore we obtain 


lim + ~/'( a X tu )l Q 

t->0+ t 


(12.21-3) 


But, by (12.2-8), /'(a)(tu) = t/'(a)(u) = t grad /(a) • u, and so we see from (12.21- 
3) that 


lim 


/(a+ fu)-/(a) 
t 


-grad /(a) • u 


-0. 


From this result and the definition in (12.21-1) we conclude the truth of 
( 12 . 21 - 2 ). 

To prove the assertion in the final sentence of Theorem IV we observe that, 
by (10.2-5), 

I [grad /(a)] ■ o| ^ ||grad /(a)||||w|| = ||grad / (a)||, 

because ||«||= 1. Moreover, by Exercise 4b in §10.12, the foregoing inequality 
becomes an equality if and only if grad /(a) is a multiple of u. But, if grad /(a) = 
cu, we must have \c\ = ||grad /(a)||. Therefore c = ±||grad/(a)||. We see in this 
way that the maximum possible value of |D„/(a)| is ||grad/(a)|| and that this 
maximum is attained if and only if 

= + grad /(a) 

||grad / (a) || 

With these two choices of sign we get DJ( a) = ±||grad/(a)||. When the plus sign 
is chosen u has the same direction as grad /(a) and we obtain a positive rate of 
change which is the greatest possible. The other choice of sign, for the 
oppositely directed u, yields a negative rate of change. In this discussion we 
have validated the final assertion of the theorem. 


The relation of the gradient of f to rates of change of / in various directions, 
as indicated in Theorem IV, is the basis of a procedure called the method of 
steepest descent, which is of great practical importance in finding extreme values 
of functions from R n to R. The essential ideas can be seen in the case where 
n — 2. Suppose we wish to find a minimum value of the differentiable function 
f(x, y). The first step is to make as good a guess as we can as to a point where a 
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minimum value is taken on. Call this initial guess (x 0 , yo) and evaluate 
grad f(x 0 , yo). If (xo, yo) is not a critical point of /, grad /(x 0 , yo) is not the zero 
vector, and the value of / will decrease from its value at (x 0 , yo) if we move to a 
point (xi, yO not too far from (x 0 , yo) in the direction of -grad f(x 0 , yo), this being 
the direction of most rapid decrease of f at ( Xo , yo). Thus for some positive 
number a 0 , not too large , 

(* 1 , y 1 ) = (*o, yo) - «o grad f(x u yi), 

where a 0 is some suitably chosen positive number, will be a point where 
/(xi, yO </(jco, yo)- If (xi, yi) is not a critical point of /, this process can be 
repeated, leading us to (* 2 , y 2 ) = (xi, yi) - grad f(x 1 , yi), and so on. Continuing 

in this way we generate a sequence of points {(x„, y n )}^o along which the 
function f decreases as n increases. The points (x„, y„) can then be expected to 
converge to a critical point, if there is one, where f has a minimum value — at 
least a local minimum. The sequence of values {/(x„, y n )}T will then decrease, 
approaching the desired minimum value as a limit. The sequence of successive 
approximations is generated by the formula 

(x n+ u y B +i) = (*n, y n ) - ctn grad f(x n , y n ), (12.21-4) 

in which the a„’s must be chosen as advantageously as possible. 

In some of the more complicated problems of practical interest, the «„’s are 
chosen mainly by experimenting on the computer. If the function / is rather 
simple and twice differentiable, a formula can be derived which sometimes 
generates a good sequence of a„’s. Suppose that (x„, y„) has been found. We 
need a number a n such that (12.21^4) will produce a point (x n+ i, y n+ i) where / has 
a smaller value than at (x n , y„). Notice that 

f(x n+1 , y„+i) = f(x n - aji n \ y„ - 

where f { x n) = f x (x n , y n ) and f ( y n) = f y (x n , y„). We now approximate f(x n+u y H+l ) by the 
Taylor series, using formula (7.5-5) with a = x„, b = y„, h = ~a n f i x ) and k = 
~aj ( y n) . Then x n+ i = x n + h and y n+ i = y„ + k. Taking just the first three terms of 
the Taylor series we get the following approximation: 

/(x»+ 1 , y„ + l) = /(.Xn, y„) - OCnUfrf + (f^f] + W„Q n 

where 

Q n = <A n, ff[V + 2 + (/WS’, 

in which /«' = |4, /»’ = and / w = t 4 are all evaluated at (x„, y„). If we are 
dx y dxdy yy dy 

getting close to the point of minimum value of /(x, y), we may assume that Q„ is 
positive (see the discussion of the sign of G(</>) in connection with (7.6-5). 
Moreover, if a n is sufficiently small, the term W n Q n should be so small that we 
can be assured that /(x„ +1 , y n+ 0 </(x„, y„) if a n is positive. Under these con- 
ditions, a good choice for a n will be that for which the expression 

-a n mV+(Sf?W2CL 2 nQn 
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is a minimum. This obviously occurs when 


OLn 


(/^) 2 +(/ ( v n) ) 2 

Qn 


(12.21-5) 


To see that this method actually can be carried out, we go back to Exercise 
13 of §6.3. Using a, programmable pocket calculator, a program of fewer than 200 
steps can be written to generate the sequence (12.21^4) using the a„’s determined 
by (12.21-5). The program can also have the machine print out /(x„, y„) and 
||grad /(x„, y„)|| at each stage. If we start with the initial guess (x 0 , y 0 ) = (1, 1), the 
following results are obtained. 


n 

Xn 

y n 

||grad /(*„, y„)|| 

f(x n , y n ) 

0 

1 

1 

10.81665 

-25 

1 

1.6724 

1.4483 

5.8338 

-28.5712 

2 

1.68995 

1.2552 

2.64037 

-29.5574 

3 

1.9754 

1.10243 

1.74862 

-29.9168 

4 

1.95826 

1.01917 

0.39027 

-29.9912 

5 

1.99880 

1.00946 

0.16869 

-29.9993 

6 

1.99673 

1.00153 

0.03104 

-29.9999 

7 

1.99992 

1.00069 

(1.01233 

-29.99999602 

8 

1.99977 

1.00011 

0.00223 

-29.99999972 

9 

1.99999 

1.00005 

0.00088 

-29.99999998 

10 

1.99998 

1.00001 

0.00016 

-30 

11 

1.999999597 

1.000003484 

0.0000624022 

-30 

12 

1.999998817 

1.000000555 

0.0000112465 

-30 

13 

1.999999971 

1.000000248 

0.000004434 

-30 

14 

1.999999916 

1.000000039 

0.000007991 

-30 

15 

1.999999998 

1.000000018 

0.000000315 

-30 

16 

1.999999994 

1.000000003 

0.000000567 

-30 

17 

2.000000000 

1.000000001 

0.000000223 

-30 


The minimum value of -30 is -found to 10 digits in 10 steps, but to get the 
critical point with equah accuracy takes 18 steps. 

The convergence here is tediously slow. A faster method will be given in 
§12.3. However, with the method of steepest descent one can get convergence 
even if the initial guess is not very good, whereas the faster method may not 
converge at all if one starts from a bad initial guess. This possibility will be 
illustrated in §12.3. 

In conclusion, we make the obvious comment that the same ideas lead easily 
to a method of steepest ascent which is useful in case one is trying to find a 
maximum rather than a minimum (Exercise 25). Finally, we should make it clear 
that formula (12.21-5) is just one of various methods for choosing the a’s. There 
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are several reasons why it can fail to give satisfactory results in certain 
problems. When this happens, one can frequently make a good guess for a at 
each stage by looking at the behavior of the points (x„, y„) up to that stage. 

12.3 / NEWTON’S METHOD 

In this section we illustrate the use of the derivative of a function f from R" to 
R n in seeking a solution to the equation f(x) = 0 by a method of successive 
approximations known as Newton's method , or also as the Newton -Raphson 
method. The work illustrates very nicely how a formula from elementary 
calculus of functions of one variable can be extended directly to the multivari- 
able case because of the way in which we have defined the derivative of a 
transformation. 

We begin with the elementary case in which n = 1. Suppose we have a 
differentiable real function of the real variable x and we wish to find a solution 
of the equation /(x) = 0. For the type of problem in which we are interested it 
will generally be known that there is at least one x such that f(x ) = 0. We may 
even have some rough notion of the value of such an x; the problem then is to 
obtain a fairly accurate approximation to its value. For example, if /(x) = 
x 3 - 3x 2 + 3, we can see that /( 2) = - 1 and /(3) = 3, so that /(x) = 0 for some x 
such that 2<x<3, and we want to find it, accurate to several places of 
decimals. Or, if /(x) = e x + x - 2, we see that /(0) = — 1 and /( 1) = 1.718, so there 
is an x such that 0 < x < 1 and /(x) = 0. What is its value more precisely? In 
many such cases, if we start with a guessed value x 0 of the solution, we can use 
Newton’s method to generate a succession of points xi, x 2 , . . . , x„, . . . such that 
the sequence {x„} will converge to a limit x such that /(x) = 0. The idea of the 
method is to find the line tangent to the curve y = /(x) at the point (x 0 , /(x 0 )), and 
then find the point x\ where this tangent intersects the x-axis. Then we can 
repeat the operation with the line tangent to y = f(x) at (xt, /(xd), and so on. See 
Fig. 86. In each of the particular examples cited above, the portion of the graph 
near its intersection with the x-axis has a general resemblance to the graph in 
Fig. 86, and if we start with a guessed value x 0 that is not too far from the 
abscissa of the intersection, then x n will converge to the solution. The formula 
for the successive x n ’s is easy to find. The equation of the tangent at (x 0 , /(x 0 )) is 

y - /(*o) = f\x o)(x - Xo). 

If we set y = 0 and solve for x we find 

x =x /(*») 

' 0 /'(Xo) 

The general formula is 

x~i - « =0,1,2,... (12.3-D 

At this point it will be instructive to the student to carry out the successive steps 
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y 



Fig . 86. 


of Newton’s method on a particular problem. As a simple example, let f(x ) = 
jc 3 — 20. To find the solution of x 3 -20 = 0 is to find the cube root of 20. Simple 
calculations with a handheld computer show that the root of /(x) = 0 is between 
2.7 and 2.8, and closer to 2.7. If we guess x 0 = 2.7 and calculate Xi and x 2 we find 
x 2 = 2.71442, which is a solution accurate to five decimal places. If we make the 
initial guess x 0 = 2.8, we get the same solution as x 3 instead of x 2 . For a less 
simple problem that is still easily handled by a pocket calculator, see Exercise 
26. 

We shall not deal with theorems about sufficient conditions to insure that the 
sequence {x n } generated by Newton’s method will actually converge to a root of 
the equation /(x) = 0, although there are theorems of this kind. The interested 
reader will find an excellent brief treatment of such a theorem (for a real 
function of a real variable) in Infinitesimal Calculus , by Jean Dieudonne 
(Houghton Mifflin, 1971). 

When we move to the general case of solving an equation of the form 
f(x) = 0, where f has its domain and range in R n , the problem is that of finding 
x = (xi, . . . , x„) as the solution of a set of simultaneous equations in n unknowns. 
If the equations are not linear this can be a formidable problem. In general we 
cannot expect to be able to solve such a problem by algebra, even if the n 
equations themselves are algebraic. 

One way in which such equations arise is when we undertake to find the 
critical points of a scalar function. Suppose, for example, we want to find the 
critical points of the function 

F(x, y) = x 3 + y 3 + 3xy 2 - 15x - 15y 

in the interior of the first quadrant. The critical points are those points where the 
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vector 

grad F(x, y)= (j^, j-'j = (0, 0). (12. 3-2) 

By direct calculation we find 

grad F(x , y ) = 3(x 2 +y 2 -5, y 2 + 2xy-5\ 


and so the equation (12.3-2) is equivalent to the two simultaneous equations 


x 2 + y 2 - 5 = 0, 
2xy + y 2 - 5 = 0. 


(12.3-3) 


These equations happen to be easy to solve by algebra. The only critical point in 
the interior of the first quadrant is (2, 1). Later, in Exercise 27, the student can 
try out the method of Newton on this problem, and observe how it can lead from 
a first guess at the solution to a highly accurate approximation to the exact 
solution. 

For the general case of f from R n to R rt , Newton’s method proceeds in a 
manner that is entirely analogous to the special case n = 1. A start is made with a 
guessed approximation x 0 to the solution of f(x) = 0 . Then we use f (x 0 )(x - x 0 ) as 
an approximation to f(x) - f(x 0 ), as is warranted by the definition of the deriva- 
tive. Then we set f(x) = 0 in the approximate formula 


f(x) - f(x 0 ) = f (xo)(x - Xo) 

and solve for x, denoting the solution by xj. To achieve the solution, we assume 
that the linear operator f'(xo) has an inverse, [f'(x 0 )r\ so that the equation 


0-f(x 0 ) = f(xo)(x I -x 0 ) (12.3-4) 

leads to the formula 

Xi = Xo - [f'(x 0 )r l f(xo). (12.3-5) 

We emphasize that in (12.3-4) the operator f'(xo) is acting on the vector x!-x 0 
and that in (12.3-5) the operator [f(x 0 )] _1 is acting on the vector f(x 0 ). We then 
proceed by a repetition of the process, obtaining a sequence of vectors Xi, 
x 2 , . . . , x„, . . . , where 

Xn+i = X„ - [f(x n )] _1 f(x n ). (12.3-6) 

It is assumed that f'(x n ) has an inverse for each x n . Sufficient conditions for the 
convergence of this vector form of Newton’s method are given in advanced 
texts on numerical analysis. 

We conclude this section with an application of this powerful method to a 
two-dimensional problem, carried out on a programmable pocket calculator. 

Example. We wish to solve the system 

- 13 + x - 2y + 5y 2 — y 3 = 0, 

-29 + x - 14y + y 2 + y 3 = 0. 
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Notice that by the elementary method of eliminating x and factoring the 
resulting cubic in y, we find immediately the solution (5,4), and we also see that 
this is the only solution which involves only real numbers. Thinking of the 
system as f(x, y) = 0, we get 



-2 + lOy - 3y 2 \ 
- 14+ 2y + 3y 2 /' 


The determinant of this matrix is 6y 2 ^_8y — 12, so f' is a nonsingular linear 
operator except on the lines y = ?(2± V22), that is, approximately, y = 2.23 and 
y = -0.897. The inverse is given by (see §11.9) 


[f'(x, y)]“' = 


— 14 + 2y + 3y 2 
6y 2 — 8y — 12 


6y 2 - 8y — 12 


2- lOy + 3y 2 \ 
6y 2 - 8y - 12 

6y 2 - 8y - 12 / 


so Newton’s method, expressed by (12.3-6), gives the sequence of vectors 
generated by the following formula 


/ ^n+l\ _ jfX n \ 

\y n+ J \y n ) 


-li'(x n , y„)] 




13 + x„-2y„ + 5y 2 - 
29 + x„ — 14y„ + y 2 + 


y» 

yl 


)■ 


A program to generate this sequence with a pocket calculator can be written 
with fewer than 200 steps. Starting with the initial guess (x 0 , y 0 ) = (10, 8) the 
following results are generated. 


n 


yn 

0 

10 

8 

1 

-21.805 

5.8701 

2 

- 2.2100 

4.6503 

3 

3.9071 

4.1187 

4 

4.95746 

4.00507 

5 

4.999918991 

4.00000987 

6 

5.000000000 

4.000000000 


This shows that convergence from the starting point (10, 8) is quite rapid, giving 
accuracy to 10 significant digits in just six steps. Such success cannot be assured 
for all starting points however. Notice the very different behavior of the 
sequence generated when we start with (x 0 , yo) = (15, —2). 


12.4 / A FORM OF THE LAW OF 
THE MEAN FOR VECTOR FUNCTIONS 

In this section we obtain (Theorem V) a generalization of the law of the mean (as 
presented in §1.2 and §7.4), and an application of it in the form of an inequality 
(Theorem VI); both are applicable to differentiable functions from R" to R m . We 
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Fig. 87. 


need the notion of a line segment in R n determined by two points, u and v. It 
consists of all points of the form 

x = u + f(v -u) where 0^ t ^ 1. 

(In this connection, see Exercise 16 of Chapter 11.) The above line segment is 
closed because it contains both end points. We shall sometimes denote it by 
[u, v]. The corresponding open segment, which we denote by (u, v), is obtained 
by restricting t to the open interval 0<f < 1. The situation is portrayed in Fig. 
87. 

The vector v - u extends from u to v, so, if 0 < t < 1, u plus t times (v - u) 
terminates at some point on the segment from u to v. As t increases from 0 to 1, 
x moves from u to v. 

THEOREM V. Let t be a function from an open set G in R" to R m . Let u and v 
be two points of G such that the closed line segment [u, v] lies in G. Suppose 
that f is continuous at each point of the closed segment and differentiable at 
each point of the open segment (u, v). Then to each vector w in R n 
corresponds a vector g terminating on the open segment (u, v) such that 

[f(v) - f(u)] • w = [fW(y ~ «)] ■ w. (12.4-1) 

Proof. Define a function F of t from [0, 1] to R m by F(f) = f[u + t(\ ~ u)]. It is 
easily seen that F is continuous on [0, 1] and differentiable on (0, 1). We can 
regard F'(0 as a vector in R m (see §12.2). In this particular case, by the chain 
rule, 

F'(0 = f'[u + t(y - u)](v - u) 

when 0< t < 1, where the value of the derivative of f is a member of i£((R", IR m ) 
acting on v u. Now pick an arbitrary vector w in and consider the 
real-valued function <j>(t) = w • F(f). It is easy to see that (f> is continuous on [0, 1] 
and differentiable in (0, 1); moreover, it is readily proved (see Exercise 7a) that 
4>'(t) = w • F'(0- Now obviously, 

4>(\) — $(0) = [f(v) — f(u)] • w. 
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Since is just a differentiable function from R to R, the ordinary law of the 
mean tells us that there is some number 0 that 0<0<1 and 

Thus we see that 


[f(v) - f(u)] ■ w = {f'fu + 0(v - u)](v - u)} • W. 

Letting £ = u + 0(v- u) we obtain (12.4-1), thus concluding the proof. 

THEOREM VI. Under the assumptions of THEOREM V, there is some point g 
on the line segment connecting u and v such that 

||f(v) - f(u)|| ^ ||f (g)(v - 11)11, (12.4-2) 

where || || of course denotes the usual Euclidean norm. 

Proof. If f(u) = f(v), (12.4-2) is obviously true, because ||0||.= 0 and the norm 
of every vector is nonnegative. If f(u) ^ f(v), we start by taking the absolute 
value of each side in (12.4-1) and applying Cauchy’s inequality (see (10.12-5)) to 
the right-hand side. 

|[f(v) - f(u)] • w| = |[f(f)[v - U)] • w| s |f(f Kv - U)|| ||w||. (12.4-3) 

Then we choose w as follows: 


f(v) - f(u) 
w I!f(v)-I(n)|f 

As we can see, ||w|| = 1 and 

Putting these results in (12.4-3) we obtain (12.4-2). 

12.41 / THE HESSIAN AND EXTREME VALUES 

In (6.9-10) we had occasion to consider a quadratic form in three variables. In n 
variables, such a function could be expressed as 

Q(h \, . . . , h„) = auhj + anhih2 + * ' * + Qinhih n 
4- a2th2h\ + a22^2 + ■ * * + U2 n h2h n 

(12.41-1) 

+ a n \h n h\ +••• + •••+■ a nn h 2 n 

We can just as well assume that an = %, and this is ordinarily done. Taking 
advantage of matrix notation and the rules of matrix multiplication, we can more 
conveniently represent the quadratic form by 

Q(ht, . . . , h„) = h T Ah 


(12.41-2) 
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where h T is regarded as a (1 x n) matrix, A is the n x n matrix (a;/), and the 
vector h is thought of as an (n x 1) matrix. The product of three such matrices, in 
this order, is defined and comes out to be a scalar. The scalar in this case is the 
quadratic form. 

We have encountered quadratic forms earlier in studying Taylor series for 
functions of several variables (§7.5). In the theory of maxima and minima, these 
series expansions are usually made about a critical point of the function, where 
all the first partial derivatives are zero. Suppose we expand f(x i, . . . , x n ) about a 
critical point a = (a h , a n ). Since the coefficients of all linear terms are zero, 

the series begins as follows: 

f(x h .. . ,x„) = /(a,, . . . , a„) + ' ’ + hn ~£^) ^] x -, + ' " 


where h k = x k - a k . The second term can be written as iQ(hi, • • > h n ), where Q is 
a quadratic from in the h’s, having as its coefficients the second order partial 
derivatives of / evaluated at a. In fact, it is exactly the quadratic from (12.41-1) 


d 2 X 

if we take an to be - — — 

oXioXj 


— , evaluated at a. The above series can then be written 


f(x u . . . , x„) = /(ai, . . . , a„) + 5 h T Ah + • • •. 


This matrix A is called the Hessian of f at a. 

We learned in (§7.6) that f has local minima at those critical points where 
h T Ah is positive definite, and local maxima at critical points where this quadratic 
form is negative definite. We then gave, without proof, a rule for deciding if a 
quadratic form is positive definite, and a rule to tell if it is negative definite. 
Since the quadratic form Q and the symmetric matrix A which represents it 
completely determine each other, it is common practice to predicate positive 
definiteness and negative definiteness of real symmetric matrices as well as of 
the quadratic forms which they represent. We must bear in mind, of course, that 
most symmetric matrices and quadratic forms are neither positive definite nor 
negative definite. This is reflected in the existence of critical points at which 
there is neither a local minimum nor a local maximum. 

We now conclude this section with a reformulation, in terms of the Hessian, 
of the rules of search for extreme values. Suppose that / is a function from an 
open subset U of R" to R, having continuous partial derivatives of first and 
second order. To find the local extremes of / in U, we first find the critical 
points. Then we construct the Hessian at each critical point. At those critical 
points where the Hessian is positive definite, / takes on values which are local 
minima; at those where this matrix is negative definite, / has local maxima. The 
Hessian of f at an arbitrary x is in fact the value f"(x) of the second derivative of 
/ at x; it is a linear transformation from R" to R n . We recall that / is a function 
from R n to R and that /'(x) is the gradient of / at x, grad / being a function from 
R n to R". Therefore the derivative of grad / is a function from R n to i£(R n , R n ); 
the Jacobian matrix that represents the derivative of grad / (and the second 
derivative of /) is what we have called the Hessian of /. The name comes from a 
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German mathematician, Otto Hesse (1811-1874). Observe the striking resem- 
blance between our sufficient conditions for a relative minimum of / at a and the 
corresponding conditions for the case of a function of one real variable. (See 
miscellaneous Exercise 2 at the end of Chapter 4.) 


12.5 / CONTINUOUSLY DIFFERENTIABLE FUNCTIONS 

We now consider further the situation discussed in §12.2 of a function f from an 
open set U of R n to R m . We assume that f is differentiable in U, that is, that f(x) 
exists at each point of U. We have sometimes denoted this derivative by T x or 
simply by T to remind us that it is a linear transformation from R" to R m , that is, 
T £ 5?(R n , R m ). When we arrive at the main theorem of this chapter — the inverse 
function theorem — we shall see the importance of knowing when the derivative 
f, from U to i£(R n , R m ), is a continuous function. For continuity of f at a point 
a, the requirement is that ||f'(x) - f'(a)|| approach zero as ||x — a|| approaches zero. 
Since f'(x)-f'(a) is a member of j£(R", R m ), we use the norm for that space, 
which was discussed in §11.8. When the function f is continuous, f is said to be 
continuously differentiable . 

It turns out that there is a simple way of testing f to find out if it is 
continuously differentiable, by examining the first partial derivatives of the 
scalar component functions / (1) , / (2) , . . . ,/ (m) of the vector-valued function f. It is 
customary to say that the function f is of class C (1) in U in case each of the first 
partial derivatives of each of the component functions is continuous in U. These 
are the partial derivatives, m x n in number, in the matrix (12.2-2). The following 
theorem provides the simple test referred to above for continuous differen- 
tiability. 

THEOREM VII. A necessary and sufficient condition that f be continuously 

differentiable in U is that i belong to class C (1) in U. 

Proof. Let x and a be points of U. Then f'(x) -f'(a) is a linear transformation 
whose matrix has the difference /j°(x) — /j l) ( a) as the element in the ith row and 
jth column. Now let 


K(x) = max |/f(x) - /<°(a)|. (12.5-1) 

n 

By (11. 8 — 4) we know that 

K (x) ^ ||f (x) - f '(a)|| ^VmnK(x). ( 12.5-2) 

The proof of our theorem follows at once from these inequalities; for, saying 
that the partial derivatives are continuous at x = a is equivalent to saying that 
K(x)-»0 as x-»a; we see this from (12.5-1). But from (12.5-2) we see that 
K(x)->0 as x^a if and only if ||f (x) - 1 '(a)|| -» 0 asx-^a. 
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12.6 / THE FUNDAMENTAL INVERSION THEOREM 

In this section we come to the central theorem of Chapter 12. It is about a 
system of rt equations by which a vector (x u x 2 , . . . , x n ) in R n is mapped into the 
vector (y h y 2 , . . . , y„) also in R". We write the system as 


/ (1 W. 

■ ,x„)=y, 


f (2 \x„ . . 

■ , x„) = y 2 

(12.6-1) 

f (n) (x„ . 

X 

a 

II 

a 



This is the same as the system (12-1) mentioned at the opening of the chapter 
except that we now assume m - n. If the component functions are linear 
functions of the xf s (as in (12-2), this system can be handled by linear algebra. 
But we want to deal with a more general case in which the / (,) ’ s are not 
necessarily linear. We shall use vector notation and vector methods. We write 
(12.6-1) as 

f(x) = y, (12.6-2) 

and assume that the domain of definition of f is a neighborhood Jf of a certain 
point a. We assume that f is continuous at each point of Jf. Then a is mapped 
into f(a) and Jf is mapped into a set (the image of Jf) that we denote by i(Jf) The 
theorem to which we are coming is about expressing x as a function of y; or 
perhaps it would be better to say that the theorem is about a way of knowing 
that the relation between x and y can be regarded as one in which x is a function 
of y. Certain assumptions and restrictions must be imposed in order to obtain a 
useful result. Otherwise the function f might not establish a one-to-one relation- 
ship between Jf and f(Jf ), two or more x’s may be mapped into the same y, and 
in such a case x is not uniquely determined by y. 

In the case of a linear system, (12.6-2) can be written in the form Ax = y, 
where A denotes annxn matrix of constants a, j (see (12-2)). We know that the 
linear system defines a one-to-one mapping of R n onto all of R" if the deter- 
minant of the matrix A is not zero, and that then x can be expressed as a 
function of y in the form x = A -1 y> where A -1 is the inverse matrix. 

The first question we ask is this: How can we obtain a condition on the 
function f that will be a suitable generalization of the condition that the matrix A 
have nonzero determinant? We get at the answer by assuming that f is differen- 
tiable. Then, if x is a point of Jf which is close to a, the difference f(x) — f(a) is 
approximately equal to T 0 [x-a], where T 0 = f'(a) is a member of 3?(R n ). This 
suggests that we consider whether the linear operator To is invertible. In fact, let 
us consider what To is in the special case in which y = f(x) takes the linear form 
y = Ax. In that case 

f (0 (x i, . . . , x n ) = a n x i + • • • + x in x n , 

and — — = an, so that the matrix A is exactly the matrix of partial derivatives 

uXj 



356 


DIFFERENTIAL CALCULUS OF FUNCTIONS FROM R' 


TO R* 


Ch. 12 


y 



evaluated at a exhibited in (12.2-2). Thus A = T 0 in this particular case. If T 0 is 
invertible, the equation y-f(a) = T 0 [x-a] is uniquely solvable for x-a in terms 
of y-f(a), and it is reasonable to wonder if this may not imply that y = /(x) is 
uniquely solvable for x in terms of y if we restrict all our considerations to x’s 
near a and y’s near f(a). 

The extremely simple case where n = 1 is illuminating. In this case if we 
assume that the curve y = f(x ) is smooth, which means that the derivative f'(x) 
is continuous, then a small piece of the curve y = /(x) near x = a, y = f(a) looks 
very much like the tangent line y —f(a) = f'(a)(x — a). In this case, if f'(a) t 6 0, 
the linear equation can be solved for x, and it is also true that, for values of x 
sufficiently close to x = a, the relationship between x and y is one-to-one (see 
Fig. 88). 

The figure also shows that the relation between x and y need not be 
one-to-one for all values of x; in Fig. 88, we see that /(x t ) = /(a) = /(x 2 ) = /(x 3 ), 
and hence there are four values of x for the y value f(a). 

We are now ready to state the main theorem. We use a method of proof that 
depends on f being differentiable throughout an open set and having a derivative 
which is continuous at one point (at least) in that set where f is an invertible 
operator. To obtain the desired conclusion, we have to restrict the values of x to 
some neighborhood of that point, and this restricted neighborhood will (in 
general) be only a part of the open set with which we start. 

THEOREM VIII. ( Fundamental Inversion Theorem). Let f he a differentiable 
function from an open set X in R" to R", and suppose that there is a point a 
in X at which the derivative, f , is continuous. Suppose further that the linear 
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operator f'(a) is invertible. Then there exists in N some neighborhood U of a 
such that 

1. f maps each pair of distinct points of U into two distinct points in R". 

2. The image set f(U), consisting of all points f(x) for which xE U, is 
open. 

3. The inverse mapping f \ defined on f(U) by f~\y) = x where x is that 
unique point of U such that y = /(x), is differentiable at each point of f(U), 
and if we write f(x) = T and denote f -1 for convenience by g, then g'(y) = T -1 . 

4. f _1 is continuously differentiable at f(a) and if f happens to be 
continuously differentiable at other points x of U, then f -1 is continuously 
differentiable at the corresponding points f(x) in f(U). 

Proof. We take the proof in steps, corresponding to the assertions (1) to (4) 
the theorem. 


Step t. For convenience we write T 0 for f'(a) and T for f'(x), bearing in mind, then, 
that T depends on x. By hypothesis T 0 is invertible and so for each w in R", 
w = TVlToCw)]. Therefore 

IHI — 11^0*11 II 7o[w]||, or ^|jjS||T„[w]||. (12.6-3) 


This relation will be used later. To simplify the notation we shall denote 


1 

II To 'll 


by 


m. 

Because T 0 is invertible we know that T will also be invertible if \\T - T 0 || is 
sufficiently small. In fact, we know by Theorem IV in §11.10 that T will be 
invertible if ||T-T 0 ||<m. Now because f is continuously differentiable at a, we 
can insure ||T - T 0 ||= ||f(x) — f(a)||< m if we keep ||x — a|| sufficiently small. We 
choose a positive number c small enough so that two things are true: 


(i) The open ball of radius c centered at a is contained in N. 

(ii) For each positive number r < c, l.u.b. || T — T 0 || < m. 

Il*~*ll=r 

We shall denote by <f>(r) the least upper bound of ||T - T 0 || referred to in (ii). 

Now we introduce U as the open ball B(a, c) = {x:||x- a|| <c}; its image set, 
f(f7), is of course the set of all points f(x) where x E L7. Let u and v be two distinct 
points of U. We shall apply the law of the mean, formula (12.4-1), which we can 
do because the line segment from u to v lies in U and hence in N, where f is 
continuous and differentiable. So we have that for each w there exists some £ 
strictly between u and v such that 

(f(v) - f(u)) ■ w = f(£)[v - u] w. (12.6-4) 

Since u and v both belong to U, ||u-a|| and ||v — a|| ar e each less than c. Let r 
denote the larger of these two n umbers . The closed ball B( a, r ) is contained in U, 
and since balls are convex, £ E B(a, r). Since ||£ - a|| S r, ||f'(£) - f'(a)|| ^ <£(r) < m. 
For convenience let us write f(£) = TV We have that j|Ti - T 0 || ^ Now (12.6-4) 
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can be written in the form 

[f(v) - f(u)] w = Ti(v - u) ■ w 

from which we can get 

tl(*(v) - f(u)) • w|| = ||(T, - T«)[v - u] • w + To[v - u] • w||; 
applying the triangle inequality (see (11.5-2)) to this we find that 

||(f(v) - f(u)) * w||^||To[v-u] • w|| — ||(Ti - T 0 )[v — u] • w|| 

But because 

||(Ti - T 0 )(v - u) • w|| ^ ||T t - To|| ||v - u|| ||w|| < 4>(r)\\v - u|| ||w||, 

and 

||(f(v) - f(u)) • w|| £ ||f(v) - f(u)|| ||w||, 

we see that 

||f(v) - f(u)|| ||w|j a ||T 0 [v - u] • w|| - <Mr)||v - u|| ||w||. 

In the last part of the foregoing reasoning, we have twice used the Cauchy 
inequality (10.12-5). 

We can now conclude this part of the proof by exploiting the arbitrariness of w 
to choose it in such a way that ||w|| = 1 and ||T 0 [v-u] • w|| = ||T 0 [v — u]||. Such a 
choice is w = To[v - u]/||T 0 [v - u]||. Then we have 

||f(v) - f(u)|| a ||To[v - u ||| - <(>(r)||v - u||. 

But we know from (12.6-3) that ||T 0 [v- u]|| ^ m||v — u||, and so we obtain the result 
||f(v) - f(u)|| ^ (m - </>(r))||v - u||. (12.6-5) 

This implies that f(v) ^ f(u) when vt^u and completes the proof of Step 1. 

Step 2. To show that f(U) is open we shall prove that to each point x 0 in U 
corresponds some positive number p, which will depend on xo, such that the open 
ball with center f(\ 0 ) and radius p is contained in f(U). As we shall see in the 
course of the proof, once x 0 is chosen a suitable choice of p can be made as 
follows: If x 0 G U , then ||x 0 ~a|| < c, and so we can choose r so that ||xo — a|| < r < c. 
We do this and then define 

p = i{m — 4>(r)}(r — ||x 0 a||), (12.6-6) 

where m and 4>(r) are as defined in Step 1. 

What we shall prove is that if y is in R" but not in f(LT), then ||y — f(x 0 )|| ^ p, for 
this is equivalent to showing that, if ||y — f(xo)|| < p, then y must be in f(U). 
Suppose, then, that there is a particular y, say y = b, in R" but not in t(U). We 
begin by letting 

d = g/b||f(x) - b|| for all x such that ||x - a|| ^ r. 

Because ||f(x) — b|| is a continuous function of x on the closed and bounded set of 
x’s for which ||x — a|| < r, it follows that the lower bound d is attained at some point 
x,, that is, that d = ||f(x,) - b||. This x t is clearly in U, whence f(x,)Ef(U), and 
therefore d > 0, because b is not in f(U). We shall prove that ||xi-a||=r. Our 
method will be to show that if we assume ||xi — a||< r we are led to a contradiction. 
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So, suppose ||xi - all < r. The contradiction to which we shall come is that d<d, 
and we shall arrive at this by an adroit use of the defining property of the 
derivative of f at xi. Because xi G U, T t = f'(xi) is invertible; there is, therefore, a 
unique vector h such that T,h = b-f(xi). Evidently h ^ 0, because b^ffxi). We 
know nothing else about the size of ||h||, but we can make ||th|| = ]t|||h|| small by 
making t small. We wish to make sure that ||xi + th - a|| < r. We can do this as 
follows: Choose t so that Oct and t||h||<r-||xi-a||. (Here we make use of the 
assumption that ||xi - a|| < r.) Then 

||xi + th - a|| ^ ||xi - a|| + t||h|| < r. 

Because f is differentiable at xi it follows (see (12.1-4)) that 

,■ fej 4- th) - f(xi) - T i(th)|| _ n 

t\M 

Let us therefore impose on t the additional requirement that it be small enough to 
make 

||f(x, + th)-f(xi)— Ti(th)|| d 

tin 2iimr 

Then 

||f(xi 4- th)-f(xi)- T t (th)|| < y- (12.6-7) 

The reason for the choice of the magnitude d/2||h|| in the inequality before (12.6-7) 
will appear presently. We place one more restriction on t, namely, that tel. 
These three restrictions are compatible and achievable. 

From the definition of d we know that 

d <||i( Xl + th) — b||. 

But b - f(x0 4- T !h, and so we see that 

d % ||f(x i + th) - f(xi) - T ih||. 

From the triangle inequality we see that 

d ^ ||f(xi 4- th) — f(xi)— Tih|| ^ ||f(xi 4- th) — f(xi)- Ti(th)|| + ||Ti(th) — Ti(h)||. (12.6-8) 

But ||Ti(fh) - T ,(h)|| = ||(t - l)Tih|| = (1 - 0||b - /Ml = d - t)d. 

Therefore, from (12.6-7) and (12.6-8) we see that 

d<Y + (l-t)d = d(l-£)<d. 

This is the contradiction we have been seeking, so we must conclude that ||x T - a|| = r. 

We now return to the consideration of ||b — f(xo)||. We are trying to show that it is 
bounded away from 0. Now 

||b - f(xo)|| = ||b - f(xi) 4- f(x,) - f(x 0 )|| ^ ||f(xj) - f(x 0 )|| ~ ||b - f(xi)||. 

But ||b - f(xi)|| = d ^ ||b - f(x 0 )||, and 

||f(x,) - f(xo)|| ^ {m - <^(r)}||x, - x 0 ||, 

as we see by applying (12.6-5) with xo and Xi in place of u and v, respectively. 



360 


DIFFERENTIAL CALCULUS OF FUNCTIONS FROM R" TO R 


Ch. 12 


[This is legitimate, as we can see by reviewing the reasoning leading to (12.6-5), 
because max{||xi - a||, ||x 0 - a||} = r.] Thus we see from the foregoing that 

||b - f(x 0 )|| ^ {m - <f>(r)} ||x, - x 0 || - ||b - f(xo)||, 
or 

||b - f(x 0 )|| ^ {{m - <^>(r)}||xi - x 0 ||. 

Finally, 

||xi - x 0 || = ||xi - a + a - Xoll = ||*i ~ a|| - ||x 0 — a|| = r — ||x 0 ~ a\\, 

and so 

l|b — /(xo)|| ^ \{m - <f>(r)}(r - ||x 0 - a||) = p. 

With this we have done what we set out to do at the beginning of this Step 2 part 
of the proof of the theorem. 

Step 3. It is clear from the already proven conclusions (1) and (2) that the mapping f 
from U to f(I7) has an inverse f _l ; for convenience we denote it by g. Let y be 
any point of f(l/) and let x = g(y), T = f'(x). We know from the details of Step 1 
that T is invertible. What we have to prove is that 

lm l|g(y + k) -||gj| y) - T - lk | L 0, (12.6-9) 

INI-o ||k|| 

for this is what is meant by the claim that g is differentiable at y and that 

g'(y) = T'. 

Before undertaking to prove (12.6-9), however, let us explain why it is plausible 
to suppose that T _1 is the correct value for g'(y). We know that f(g(y)) = y, 
because g = f\ If we knew that g were differentiable, we could use the chain rule 
to conclude that the derivative of f(g(y)) would be f '(g(y))g'(y) = T g'(y). On the 
other hand, the derivative of the function whose value at y is y is clearly the 
identity operator on R". (See Exercise 4 at the end of the chapter.) 

We now proceed to prove (12.6-9). We assume that ||k|| is small enough to insure 
that y + kEf( U ). Let h = g(y + k) - g(y). Then g(y + k) = x + h and y + k = f(x + h). 
Because the mapping is one-to-one, it follows that h ^ 0 if k ^ 0. Observe that 

g(y + k) - g(y) = h = T _l T[h] 

and 

T -1 [k] = T _1 [f(x + h) — f(x)], 
so that the norm in the numerator of (12.6-9) becomes 
||T , {T[h]-[f(x + h)-f(x)]}||, 
which is less than or equal to 

||T-||||f(x + h)-f(x)-T[h]||. 

For the denominator in (12.6-9) we have 

||k|| = ||f(x + h) - f(x)|| ^ (m - <^(r))||h||, 

by applying (12.6-5) with v = x + h, u = x. The r here is the maximum of ||x + h - a|| 
and ||x - a||; it may change as h->0, but we know that ||x + h-a||<||x-a|| + ||h||, and 
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so, if r 0 is chosen so that ||x - a|| < r 0 < c, we can be assured that r<r 0 if ||h|| is 
sufficiently small. Then, since (p(r) < </>(r 0 ) (by the way in which $ is defined in 
Step 1 of this proof), we see that 

IMI = (w - </)(r 0 ))||h||. (12.6-10) 

Thus the quotient on the left side in (12.6-9) does not exceed 


Iir-'II Hf(x+h)-f(x)-T[h]il 

m - </>(r 0 ) ||h|| 


( 12 . 6 - 11 ) 


Here h is variable, depending on k, and we see from (12.6-10) that ||h||-*0 as 
||k||-»0. But then the expression in (12.6-11) converges to 0 as ||h||-*0, by virtue of 
the fact that T = f (x). This completes the proof of Step 3. 


Step 4. The argument here is very simple. We recall that g'(y) = T 1 where T - 
f'(g(y))- Now g is continuous because, as we proved in Step 3, it is differentiable at 
each point y of t(U), and T _1 depends continuously on T (by Theorem V of §11.10). 
Therefore, by application of the composition theorem (Theorem II, §11.7), g' is 
continuous at each point y = f(x) for which x = g(y) is a point in U at which f is 
continuous; of course a is one such point. 


A few comments are in order about Theorem VIII. 


1. It is a powerful result showing how the invertibility of a system of nonlinear 
simultaneous equations can be inferred from the invertibility of a closely related 
linear system. 

2. It is a local , rather than a global , theorem. That is, it draws conclusions merely 
about what happens in some sufficiently small neighborhood of a point. By 
contrast, Cramer’s rule is a global theorem about a system of simultaneous linear 
equations. 

3. The proof we have given is independent of the particular dimension number n, and 
does not require us to display the co-ordinates of any of the vectors. It is a good 
example of the effectiveness of the use of vector space methods. Finally it should 
be noted that the assumptions used are sufficient to conclude invertibility but not 
necessary. That is, there are instances in which the mapping may be invertible, 
even globally , and yet some of our assumptions may not be valid (e.g., f(a) may 
fail to be invertible). An elementary example is given in Exercise 17. 


12.7 / THE IMPLICIT FUNCTION THEOREM 

An important introduction to this theorem is given in §8 of Chapter VIII. The 
student is advised to reread it at this point. The introduction which we give here 
is somewhat similar but is intended to introduce the implicit function theorem as 
an extension to nonlinear analysis of a well-known theorem in linear algebra. 

If one has three simultaneous homogeneous linear equations in five vari- 
ables, for example, 

a\X\ 4- a 2 x 2 + • • • + a 5 x s = 0 

bjXi + b 2 x 2 + * * ■ + b 5 xs = 0 (12.7-1) 

C\X\ + c 2 x 2 + • • • + c 5 x 5 = 0, 



362 


DIFFERENTIAL CALCULUS OF FUNCTIONS FROM R" TO R 1 


Ch. 12 


it makes sense to ask whether the system implies that some three of the 
unknowns, say x 3 , x 4 , and x 5 are functions of the other two. By transposing the 
terms involving the other two variables to the other side and using Cramer’s rule, 
we see that a necessary and sufficient condition that x 3 , x 4 , and x 5 be implicitly 
determined uniquely as functions of X\ and x 2 is that the determinant 


a 3 

a 4 

a 5 

b 3 

b 4 

b 5 

c 3 

c 4 

c 5 


be different from zero. In other words, by using the inverse function theorem for 
linear functions, we have obtained an implicit function theorem. We shall now 
obtain a partial extension to nonlinear functions, that is, to systems like 


fi(x i, x 2 , *3, x 4t x 5 ) = 0 

f 2 (x u *2, *3? *4, *5) = 0 (12.7-2) 

f&uXj, x 3 , x 4 , Xs) = 0 

which reduce to (12.7-1) when the functions are linear. Notice that in the linear 
case, the determinant above, whose nonvanishing is so important, is the same as 
the Jacobian determinant 


Mi 

Mi 

Mi 

dX 3 

dX 4 

dx$ 

df 2 

df 2 

Mi 

dx 3 

dX 4 

dXf 

Ml 

Mi 

Ml 

dx 3 

dX 4 

dXf 


Expressed in this way, this same condition will also be of central importance in 
the nonlinear case. 

The equations (12.7-2) can advantageously be written 

/i(*i, x 2 , yi, y 2 , yi) = 0 (i = 1,2, 3). (12.7-3) 

The change in notation, writing yi, y 2 , y 3 instead of x 3 , x 4 , x 5 , is to help us 
remember that we want to solve for the y’s in terms of the x’s. Further, the 
ordered pair (xi, x 2 ) will be denoted by the vector x and we shall let (y h y 2 , y 3 ) = 
y. The three equations then become 


/i(x, y) — 0 (i = 1, 2, 3). 

For each i, is defined on ordered pairs of the form (x, y) where x£R 2 and 
y €E R 3 , and each of the three functions is real valued. Since an ordered triple of 
real-valued functions is commonly thought of as a vector-valued function taking 
its values in R 3 , the system of three equations can conveniently be thought of as 


f(x, y)= 0. 


(12.7 — 4) 



12.7 


THE IMPLICIT FUNCTION THEOREM 


363 


We have to remember of course that x E R 2 and y E R 3 , and that f takes its values 
in R 3 . 

If A and B are sets, then the set of all ordered pairs of the form (a, b), 
where a E A and b E B is called the Cartesian product of A and B and is 
denoted by Ax B. Ordered pairs (x, y) introduced in the preceding paragraph 
belong to R 2 x R 3 ; thus we say that f is a function from some subset of R 2 x R 3 to 
R 3 . Since the points of R 2 X R 3 are ordered pairs of the form (x,y), where x is an 
ordered pair of real numbers and y is an ordered triple of real numbers, (x, y) can 
be identified with an ordered quintuple of real numbers. Hence, it is common to 
identify R 2 x R 3 with R 5 . 

In order to get clearly in mind what is meant by solving (12.7-4) for y as a 
function of x, consider the following very simple special cases. 

f(x , y) = 3x ■+ 2y - 5 = 0 
This can be solved at a glance to get 

y = <Hx) = \(5-3x). 

The essential feature of this function (f>(x) is that if we substitute it for y in the 
equation /(*, y) = 0, we get an identity, namely 

/(jc, <p(x)) = 3jc + 2<t>(x) - 5 = 3x + (5 - 3jc) - 5 = 0. 

In this special instance, the identity holds for all x . In more complicated cases, 
we may have to settle for identities which are valid only over some subset of the 
vector space to which x belongs. And in the more general case of the equation 
f(x, y) = 0, to solve for y in terms of x means to find a function <f>(x) such that 
f[x, </>(x)] = 0, at least for all x belonging to some set in R 2 . 

Consider the set of all pairs (x, y) such that f(x, y) = 0. We call this the 
solution set for the given equation. To avoid dealing with a situation which is of 
no interest, we must assume that the equation does have solutions, that is, that 
the solution set is not empty. We are interested in knowing whether this set has 
the property that when (xj, yO and (x 2 , y 2 ) both belong and Xi = x 2 , then neces- 
sarily yi = y 2 . If it does have this property, then y is determined as a function of 
x. 

There are instances in which the solution set does not determine y uniquely 
as a function of x. For example, to take a case in which jc and y are both in R, 
suppose /(jc, y) = x 2 + y 2 - 1. Then (0, 1) and (0, —1) are both in the solution set, 
so that there are two values of y (instead of only one) corresponding to x = 0. 
But if we start with the pair (0, 1) and confine attention to pairs ( x , y) of the 
solution set for which x is close to 0 and y is close to 1, we find that this 
restricted portion of the solut ion set does define y uniquely as a function of x , 
the formula being y = V 1 — x 2 (positive square root). This restriction of attention 
to all points (x, y) of the solution set close to a particular point (a, b) of the 
solution set is a standard feature of implicit function theorems. 

Now for the implicit function theorem. We shall state and prove it for the 
case we have been discussing of three equations with five variables, but only 
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trivial changes in notation are required to get a proof valid for m equations in n 
variables, where m < n. 

THEOREM IX. (AN IMPLICIT FUNCTION THEOREM). Suppose that W is an 
open subset of R 5 = R 2 x R 3 and that f is a differentiable function of class C (1) 
from W to R 3 . Assume further that there is a point (a, b) in W such that 
f(a, b) = 0, and assume that the derivative of f(a, y), as a function of y, is 
nonsingular at y = b. Then the equation f(x, y) = 0 determines y uniquely as a 
C (1) function of x near (a, b). More precisely , there is some neighborhood E 
of b in R 3 and some neighborhood S of a in R 2 and a continuously 
differentiable function </> from S to R 3 such that </>(x) E E and f(x, </>(x)) = 0 
whenever x is in S. Furthermore , the only points y of E which satisfy 
f(x, y) = 0 for some x in S are those for which y = </>(x). 

Proof. Our main tool is the inverse function theorem for functions from R" 
to R n . Since our function f is from R 5 to R 3 , the tool seems ill-suited to the task, 
but this disparity can easily be remedied. In the special case of the linear 
problem (12.7-1), we reduced it to a problem about functions from R 3 to R 3 by 
transposing the x\ and the x 2 to the other side. In the nonlinear case (12.7-3), 
there is no way to get the x x and x 2 to the other side, so we artificially inflate the 
problem to one about functions from R 5 to R 5 in the following trivial way. We 
define functions F b . . . , F 5 as follows: 

F x (x i, x 2 , y t9 y 2 , y 3 ) = x t 
E 2 (x x , x 2 , y u y 2 , y 3 ) = * 2 
F 3 (x b x 2 , y u y* y*) = fi(x u x 2 , y h yi, > 3 ) 

F 4 (X U X 2 , yu y2, yd = f-Xxu X 2 , yu yi, yd 
F 5 (x u X 2 , y u yi , y 3 ) = / 3 (xi, x 2 , y,, y 2 , y 3 ) 

Notice that this vector-valued function F can be defined more briefly as follows: 

F(x, y) = (x, f(x, y)). 

Since f is continuously differentiable, Theorem III tells us that the same is true 
of F. By Theorem III, the derivative of F at (a, b) is represented by the Jacobian 
matrix 


/ 1 

0 

0 

0 

° \ 

/ 0 

1 

0 

0 

° \ 

3/. 

Mi 

Mi 

Mi 
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dx 2 
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dyi 

3y 3 
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where all the partial derivatives are evaluated at (a, b). We wish to show that the 
linear transformation represented by this matrix is nonsingular. 

Notice that the 3x3 submatrix in the lower right-hand corner is nonsingular, 
because it represents the derivative of f(a, y) as a function of y at y = b; in the 
upper left-hand corner we have the 2x2 identity matrix having determinant 1. 
So the determinant of the 5x5 matrix has the same value as that of the 
determinant of the 3x3 submatrix in the lower right-hand corner. In other 
words, the derivative of F at (a, b) is nonsingular. 

We observe that F maps (a, b) into the point (a, 0) of R 2 x R 3 . By applying the 
inverse function theorem (Theorem VIII) to F, we see that there must exist a 
neighborhood of (a, b) in R 5 such that °IL is contained in W and such that F 
defines a one-to-one mapping of °tt onto a neighborhood Y of (a, 0) in R 5 . 
Moreover, the inverse mapping F -1 is of class C (1) on Y. It is easy to prove (see 
Exercise 21) that every neighborhood of (a, b) contains a neighborhood of this 
point which is a Cartesian product Dx£, where D is a neighborhood of a in R 2 
and E is a neighborhood of b in R 3 ; we can assume that °U itself is such a 
Cartesian product, and shall do this for convenience. 

Let us write points of Y in the form (x, z), where x is in R 2 and z belongs to 
R 3 . To each (x, z) in Y, there corresponds a unique point F -1 (x, z) = (x, y) in 
such that f(x, y) = z and F(x, y) = (x, z). Thus, y is determined uniquely by x and 
z, and this defines y as a function of x and z, say y=g(x, z). In particular, 
g(a, 0) = b. Then F _1 (x, z) = (x, g(x, z)). Since F 1 is of class C (1) , it is readily seen 
that g is also of class C (1) . 

Let S be the set of x’s such that (x, 0) is in Y. Clearly, S is contained in D, 
since (x, 0) comes from some point in °U = D x E by the mapping F. Observe that 
(a, 0) comes from (a, b). It is easy to prove (see Exercise 20) that S is an open 
subset of R 2 . Let us define a function $ on S by the formula <f>(x) = g(x, 0). 
Observe that <j>(x) is in R 3 and that (x, <Hx)) is in °U when x is in S. In particular, 
<Ma) = b and <j>(x) is in E. Observe also that F ! (x, 0) = (x, </>(x)). This means that 
(x, 0) = F(x, <H X )) = (x, f[x, </>(x)]), and hence, that f(x, <£(x)) = 0. The function <j> is 
of class C (1) on S, because <f>(x) = g(x, 0) and we know that g is of class C (1) . 

Finally, to prove the last assertion in Theorem IX, assume that y is a point of 
E such that f(x, y) = 0 for some x in S. Then F(x,y) = (x, 0) is in Y, and hence, 
since the mapping of onto Y is one-to-one, we are assured that y = g(x, 0) = 
</>(x). This completes the proof. 

In conclusion, we shall now state a more general form of the implicit 
function theorem. Except for minor changes in notation, the proof is the same as 
that for the special case just treated. 

THEOREM X (THE IMPLICIT FUNCTION THEOREM). Suppose that W is an 
open subset of R p+<? (which we shall identify with R p x R q ) and let t be a 
continuously differentiable function from W to R q . Assume further that there 
is a point (a, b) in W such that f(a, b) = 0, and such that the derivative of 
f(a, y) as a function of y is nonsingular at y = b. Then the equation f(x, y) = 0 
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determines y uniquely as a continuously differentiable function of x near 
(a, b). More precisely, there is some neighborhood E of b in R fl , some 
neighborhood S of a in R p , and a continuously differentiable function (f> from 
S to R q such that <£(x)EE and f(x, <f>(x)) = 0 whenever x is in S. Further- 
more, the only points y of E which satisfy f(x, y) = 0 for some x in S are 
those for which y = $(x). 

12.8 / DIFFERENTIATION OF SCALAR PRODUCTS OF 
VECTOR VALUED FUNCTIONS OF A VECTOR VARIABLE 

We recall from elementary calculus that if two real functions of one real variable 
are differentiable, the product of the functions is also differentiable, and the 
formula for the derivative of the product is 

(fg)'(x) = f(x)g'(x)-hf'(x)g(x). 

This formula has a counterpart for the case of the dot product (also called scalar 
product or inner product) of f(x) and g(x) where f and g are functions from R M to 
R m . Recall from (10.12-1) the definition of the dot product in R". 

THEOREM XI. Let f and g be functions from an open subset of R" to R m , and 
define </> from R” to R by 

4>(x) = f(x) • g(x). 

If f and g are each differentiable at a particular x, so is </>, and 

d<j>(x, h) = </>'(x) • h = f(x) • g'(x)h + f(x)h • g(x), (12.8-1) 

for each vector h in R n . 

Proof. The special case in which one of the two factors, f and g, is a 
constant vector is treated in Exercise 7(a). This more general result follows by 
an extension of the reasoning which establishes the special case. The method of 
proof is to make straightforward use of the definition of differentiability as 
applied to f and g, and then recognize from what we get the fact that 
f(x) • g'(x)h 4 - f'(x)h • g(x) fulfills the requirement for being the differential of </> at 
x, which means that it is a good approximation to <f)(x + h) - </>(x) in the 
appropriate sense when ||h|| is small. 

We begin by writing 

f(x + h) = f (x) 4- f'(x)h 4- c(h)||h||, (12.8-2) 

g(x + h) = g(x) + g'(x)h + r)(h)||h||, (12.8-3) 

where e and 17 are vector functions of h which approach zero when h does. 
[Compare with the use of e as a function of h\, . . . , h„ in (7-2).] Now form 

A 4> = f(x + h) • g(x + h) - f(x) • g(x), (1 2.8 — 4) 

and substitute from (12.8-2) and (12.8-3) on the right in (12.8^1). By the 
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properties of the scalar product in Exercise 8 of §10.12 one can express the first 
dot product on the right in (12.8~^t) as a sum of nine terms, one of which is 
f(x) • g(x). On looking carefully at the other eight terms, we can see that 

4>(x + h) - = f(x) • g'(x)h + f (x)h • g(x) 

+ ||h||{a sum of five terms} 

+ f'(x)h*g'(x)h. 

The sum of the first two terms here on the right of the equality sign is evidently 
linear in h, and therefore of the type desired for the differential of <$>. What is 
needed, then, is to show that the remaining expressions on the right, taken 
together, have an absolute value less than or equal to ||h|| times some function of 
h that approaches zero as h does. This is not very difficult, and we leave it to the 
student to carry out the steps. (Exercise 28). 

For purposes of applications, it is advantageous to translate (12.8-1) from 
the language of differentials to the language of derivatives. This reformulation 
can be broken down into several small steps. Since dot multiplication of vectors 
is commutative, we can write 

• h = f(x) • g'(x)h + g( x ) • f'(x)h, 

and expressing dot products in terms of matrix multiplication (§11.4) leads to 
*'(x) ■ h = [f(x)] T [g'(x)h] + [g(x)] T [f'(x)h]. 

Using the fact that matrix multiplication is associative, 

4>'(x) • h = {[f(x)] T g'(x)}h + {[g(x)] T f'(x)}h 
= [f(x) T g'(x) + g(x) T f'(x)]h. 

In Exercise 4, Chapter 11, it is indicated that the transpose of the sum of two 
matrices is the sum of their transposes, and that the transpose of a product is the 
product of the transposes in the reverse order. Using these two facts we get 

4>'(x) ■ h = [g'(x) T f(x) + f'(x) T g(x)J 7 h, 

and from (11.4-2) this can be written as the dot product of the vector in brackets 
with h, that is, 

d(j>(x, h) = <f>‘(x) ■ h = [g'(x) T f(x) + f'(x) T g(x)] ■ h. 

Comparing this with (12.2-8) we see that the only way this can hold for all h is 
for the vector in brackets to be the gradient of <f> at x. Therefore 

</>'(x) = grad <t>(x) = g'(x) T f(x) + f'(x) T g(x), 

and we have arrived at the following reformulation of THEOREM XI. 

THEOREM XI'. Under the hypotheses of THEOREM XI where <f>(x) = f(x) • g(x), 
<f> is differentiable at x and 

*'(*) = g’(x) r f(x) + f'(x) T g(x). 


(12.8-5) 
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Notice that when f - g, this formula reduces to 

<J>'(x) = 2f(x) T f(x). (12.8-6) 

Example 1. Consider the following system of m linear equations in n 
unknowns, 


Ax- b (12.8-7) 

where x = (x u x 2 , . . . , x„), b = (b u . . . ,b m ) is given, and A is a given mxn 
matrix. 

Perhaps there is no x which satisfies the equation (12.8-7). If there is such an 
x, then of course Ax - b = 0, and hence ||Ax - b|| 2 = 0. Otherwise, Ax - b is differ- 
ent from zero for all x. Even when no solution exists, one sometimes needs to find 
a vector which comes as close as possible to satisfying (12.8-7). To be more 
precise, notice that (12.8-7) associates with each vector x the vector Ax-b, 
which is called the residual of x. We now broaden our point of view regarding 
(12.8-7); instead of looking at it as an equation to be solved, we take the problem 
to be that of finding the vector having the smallest possible residual, that is, we 
try to make || Ax - b|| as small as possible, which is equivalent to making || Ax - b|| 2 
an absolute minimum. If (12.8-7) does have solutions, this approach will lead us 
to one, since the residual of a solution is the zero vector. Otherwise we shall find 
the nearest thing to a solution, in the sense of a vector whose residual has the 
least possible norm. Such a vector is called a least squares solution of (12.8-7). 
If its residual is not actually the zero vector, then a least squares solution is not, 
of course, a solution in the strict sense. It is a good example of what is 
frequently called a “generalized solution” — which usually refers to something 
obtained by relaxing somewhat the requirements that a true solution would have 
to satisfy. Generalized solutions have great importance in some areas of 
mathematics. 

To find a least squares solution of (12.8-7), we begin with the function 
<£(x) = || Ax - b|| 2 - (Ax - b) • (Ax - b), 

and try to minimize it. What we have here is a function from R n to R which, by 
Theorem X, is clearly differentiable everywhere, since Ax-b is. By §7.6, the 
critical points of cj> are the solutions of 

grad </>(x) - 4>'(x) = 0, 

and by putting f(x) = Ax - b in (12.8-6), the critical point equation becomes 

</>'(x) = 2A r (Ax - b) - 0, 

since in this case, f'(x) = A for all x. The above equation is clearly equivalent to 

A t Ax — A r b. (12.8-8) 

Recall that A is an m x n matrix, making A T A an n x n matrix; and since b can 
here be thought of as an m x 1 matrix, A r b is an n x 1 matrix, or a vector in R n . 
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The point now is that the n x n matrix A T A may be invertible, even when 
the matrix A of our original problem is not. (An example in which this possibility 
is realized is found in Exercise 29.) When this turns out to be the case, (12.8-8) 
has the unique solution 

x = (A T Ar ! A T b. (12.8-9) 

This shows that when A T A is invertible, the function <j> has a unique critical 
point, namely the x given by (12.8-9). 

There are several ways we can see that <f> has its absolute minimum at this 
critical point. The function is obviously nonnegative and takes on the value ||b|| 2 
at x = 0. It is easily seen that </>(x) becomes very large when ||x|| is large, and 
therefore, if the positive number R is chosen large enough, we can be sure that 
<f>(x) > ||b|| 2 if ||x|| > R . It then follows (from what theorem in Chapter 6?) that <f>(x) 
has an absolute minimum value for some x, and that the minimum occurs for an 
x such that ||x||<jR. Exercise 30 asks the student to write out the argument in 
full. It is very much like Exercises 4 and 5 of §6.3. 

Another way of identifying the critical point as a minimum point is to find 
the Hessian matrix of <t>. This method was developed in §12.41. It is easy to see 
that this Hessian matrix is just the second derivative of <j>. Since the first 
derivative of </> is 2 (A T Ax- A T b), we see at once that the Hessian, H, is simply 
2 A t A. We now consider the quadratic form h T Hh. Since 

h T Hh = 2h T A T Ah = 2(Ah) T Ah = 2(Ah) (Ah) = 2||Ah|| 2 , 

we see that this quadratic form can never be negative. But neither can it be zero 
unless h = 0; because suppose there were an h^O such that h T Ah = 0. Then, by 
the above equations Ah = 0 and therefore A T Ah = 0. But A T A is nonsingular, 
and therefore maps only the zero vector into zero. This contradiction forces us 
to conclude that the Hessian is positive definite. From §12.41 we know that the 
unique x given by (12.8-9) makes <j> a minimum, and since this x is the only 
critical point, </> must have its absolute minimum there. 

Since (12.8-9) gives us a generalized solution of (12.8-7), (A T A) _1 A 7 is called 
a generalized inverse of A in those cases where A T A is invertible. There is vastly 
more to this subject, but we must leave it at this point. One final thing: If n = m 
and if A is invertible, so is A T , and in that case 

c A t AY'A t = A~\A T r l A T = A~\ 

showing that if A is invertible, then the generalized inverse turns out to be the 
same as the inverse. 

Example 2, If A is an n x n symmetric matrix (so that A T = A), then x • Ax is 
a quadratic form in n variables Xj, . . . , x n . Consider the problem of seeking the 
extreme values of x • Ax subject to the constraint that ||x|| 2 = 1, that is, that 
x *x = 1. Since all vectors having unit length are called unit vectors, and since 
the set of all unit vectors is called the unit sphere, our problem is one of finding 
the extreme values of a quadratic form on the unit sphere. 
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Using the Lagrange multiplier method of §6.8, we form the function 
4>(x) = x • Ax - A(x • x) = x • (Ax - Ax), 

calculate <£'(*) and examine the equation <£>'(x) = 0. To calculate the derivative 
we apply (12.8-5) with f(x) = x and g(x) = Ax-Ax. Then f'(x) = 7 and g'(x) = 
A-A7, where I is the identity matrix (for which ix = x for all x). The result is 

<j>'(x) = (A - A7) T x + 7 T (Ax — Ax). 

Since 7 is represented by the matrix with Ts along the main diagonal and 0’s 
everywhere else, it is obvious that I T = 7. Since A is symmetric and A - A7 
differs from A only along the main diagonal, A - A7 is also symmetric. This gives 

= (A — A7)x + Ax — Ax = 2(Ax — Ax), 
so $'(x ) = 0 is equivalent to the equation 

Ax -Ax. (12.8-10) 

This reveals that any vector that solves our original problem must have the 
interesting property that when it is transformed by the operator A, the result is 
exactly the same as when it is multiplied by a certain scalar. Such vectors are 
called characteristic vectors of A. For each characteristic vector v, the scalar by 
which we can multiply v to get Av is called the characteristic value associated 
with v. 

In matrix algebra we learn that for the system (12.8-10) there are n distinct 
characteristic vectors {u/J" of unit length, and n associated characteristic values 
(which may or may not be distinct). Thus we have 

Au k = A k u k , k = 1, 2, . . . , n. 

Recall that the extreme values that the quadratic form has on the unit sphere can 
occur only on certain of these u k ’s. Now notice that it is easy to see what values 
x • Ax has on these characteristic vectors; we simply take the dot product of 
both sides of the above equation with u fc . This gives 


u k • Auk = AkUk Uk = A k . 


In other words, on each unit characteristic vector the value which the quadratic 
form assumes is simply the associated characteristic value. In matrix algebra 
courses one learns to find these characteristic values. The largest of them is 
always the absolute maximum of x • Ax on the unit sphere, and the smallest is 
the absolute minimum of this quadratic form subject to the constraint ||x|| = 1. 

Example 3. In this final example we show how the method of steepest 
descent can be used to find solutions of systems of nonlinear equations where 
the number of equations may or may not be the same as the number of 
unknowns. 
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Consider the system of m nonlinear equations in n unknowns, 

/ (1> (*i, = 0 

/ <m) (*i, . . . , x„) = 0 

which we shall henceforth write in vector form, 

*00 = 0 , 

and assume that f is a differentiable function which maps R n (or an open subset 
of it) into R m . We may begin our search not even knowing whether any solutions 
exist. Our goal of course is to find a solution if any exist, and if there are none, 
to find a vector which comes as close as possible to being one. In either case, 
what we are trying to do is to make ||f(x)|| as small as possible. We shall 
sometimes speak of it as an attempt to make ||f(x)|| 2 as small as possible, bearing 
in mind that the two ways of saying it come down to the same thing. Even if 
there is no value of x which makes f(x) vanish, we shall call x a least square 
solution of f(x) = 0 if it makes ||f(x)|| a minimum, that is, which makes f(x) as 
close to the zero vector as possible. 

Since ||f(x)|| 2 = f(x) * f(x), an application of (12.8-6) gives 
grad ||f(x)|| 2 = 2[f (x)] T f(x) 

where f'(x) is the (m x n) Jacobian matrix of f, so its transpose is an (n x m) 
matrix which maps f(x) into R". 

Since [f (x)] T f(x) is a vector in R n pointing from x in the direction of most 
rapid increase of ||f(x)||, to decrease ||f(x)|| most rapidly we should move from x in 
the opposite direction. This suggests that we begin our search for a solution of 
f(x) = 0 by making an initial guess x 0 . Then, for proper choice of some positive 
number a 0 , the next approximation, xi = x 0 - a 0 [f'(xo)] T f(xo), should be such that 
||f(xj)|| is smaller than ||f(x 0 )||. Repetition of this process leads to the sequence- 
generating procedure 

x„+i = x n - at„[f(x„)] r f(x n ). 

For a good choice of the a n ' s, {||f(x„)||}o should be a decreasing sequence of 
positive numbers converging to a minimum value of ||f(x)||, in which case the 
sequence {x„}o could be expected to converge to either a solution or a least 
square solution of f(x) = 0. 

Skill in the use of this method is a matter of making good choices for the 
a„’s. A bad choice of a„ would be one which would make ||f(x n +i)|| bigger than 
||f(x„)||. This would mean that even though we started out from x„ in the right 
direction, we went too far. To remedy this mistake, one might then repeat the 
determination of x n +u using a value for a n only half as big. 

As you might have begun to suspect, this method of steepest descent is 
likely to be tediously slow in converging, and in this respect it is much inferior to 
Newton’s method — if Newton’s method converges at all. In cases where New- 
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ton’s method can be used if one starts close enough to the desired solution, a 
good strategy might be to begin with the method of steepest descent and then 
switch to Newton’s method when indications are that a solution is near. 


EXERCISES 

1. Prove that, if a function f from an open set in R n to R m is differentiable at a point 
a, the derivative f'(a) is uniquely determined. Hint: Suppose there were two T’s, T 1 and 
T 2 , satisfying (12.1^4). Use the triangle inequality to show that 


H (T, - T 2 )h|| 

Nho ||h|| 0 


and from this explain in detail why ||Ti - T 2 || = 0. 

2. Prove that if f (as in Exercise 1) is differentiable at a, it is continuous at a. 

3. Suppose the real-valued function / of x = (x x„), defined in a neighborhood 

Sf 

of a = (a u ...» «n), has first partial derivatives = fi at each point of the neighborhood, 

uXi 

and that these are continuous at a. Show that / is differentiable at a. Hint: The proof 
rests on n applications of the law of the mean for functions of one real variable, together 
with the addition and subtraction of n - 1 terms, as is done in (7.1-2) for the case n ~ 2. 
For the general case some good notation is very helpful. We illustrate for the case n = 3. 
Let h = (h u hi, hi), h (0) = (0, 0, 0), h (,) = (h u 0, 0), h (2) = (h u h 2 , 0), h (3) = (h u h 2 , h 3 ). Verify 
that 

/(a + h) -/(■)= 2 [/(a + h <k) ) — /(a + h <k ~’)] 

k = 1 

= 2 W k (a + (i - e k )W k -" + e t h' k> ) 

k — 1 


for certain numbers 6 k , where 0 < 6k < 1. Now complete the argument. 

4. Consider the function G(y) = y defined on some open set S in R". Prove that G is 
differentiable on S with G'(y) - I (the identity operator on R n ) for each y in S. 

5. Find the flaws in the following. If we let n = 1 in Exercise 4, we have that if 
G(y) = y on some open interval (a, b ), then G' exists and is equal to I at all points of 
(a, b). And clearly G is the identity function on (a, b). Therefore G f — G on (a, b). Hence 
G(y) = ce y , proving that y = ce y for all y in ( a , b). 

6. Given T E ^(R B , R m ), define f(x) = Tx for each x in an open set U in R”. Prove 
that f is differentiable at each point of U and that f (x) = T for each x. 

7. (a) Given a function f from an open set in R n to R m and a fixed vector a in R m , 
show that if f is differentiable at x 0 , then so is the function F from R n to R defined by 
F(x) = a • f (x), the differential being given by dF(xo, h) = a * f'(*o)h. 

(b) Show that F is continuous at x 0 if f is, even when f is not differentiable. 

8. Suppose that f is a differentiable function from an open connected set G in R" to 
R m , and suppose that its derivative at each point is the zero transformation. Show that f 
has the same value at all points of G. Hint: Treat first the case where G is convex. 

This is a generalization of Theorem V of §1.2 and also of Theorem VII of §7.4. 

9. (a) Let g be a function from [0, 1] in R to R", defined by g(t) = u + tw, where u 
and w are fixed vectors in R". Find dg(t, h) for t E [0, 1] and h E R. Explain the difference 
between g'(t) as a member of i£(R, R") and g'(t) regarded as a vector in R M . 
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(b) If f is a differentiable function from an open set in R n to R m and F(t) = f(g(f)), where g 
is as in part (a), find dF(f, h ) and F'(f). 

10. Show that the function f from R 2 to R defined as follows, 

f(x ’ y) = x*+7 if (*,y)*(0,0), 
and /( 0, 0) = 0, 

has partial derivatives with respect to both jc and y at (0, 0) but is not continuous there, 
and hence is certainly not differentiable at the origin. 

11. Let g(x, y) = - P 4 if (*, y) * (0, 0), and 

X ~r y 

g( o, 0) = 0. 

Show that g has directional derivatives in all directions at the origin, but that it is not 
differentiable there. Is g continuous at the origin? 

12. Define F(x, y) = (2x 2 - y)(y -x 2 ) if y ^x 2 or 2 jc 2 ^ y, and define 

F(x, y) = (2x2 ~ y >(y~ x2 ) j f 0<jc 2 <y <2x 2 . 
jcy 

Show that (a) F is continuous at each point, including the point (0, 0), and (b) F has a 
directional derivative in every direction at (0, 0), but (c) F is not differentiable at (0, 0). 

13. With / as in the opening paragraph of §12.21, suppose / is differentiable at x = a. 

(a) Prove that DJ( a) = -D_ u /(a). 

(b) Assuming that -u, express D u /(a) + D v /(a) in terms involving just one directional 
derivative at a. 

14. If f is a function from some subset A of R” to R m , prove that f is continuous at 
the point a of A if and only if each of its component functions is continuous at a. 

15. Assume that f is a function from an open subset U of R" to R m , and that a is a 
point of U at which all mn of the first partial derivatives of the component functions of f 
exist. Prove that if T is that element of j£(R", R m ) whose standard representation is the 
Jacobian matrix of f at a, then either f is differentiable at a with derivative T, or else, 

||f(a + h)~f(a)-Th|| 

INI 

does not exist. Hint: Show that the indicated limit, if it exists, must be 0. 

16. Prove that if T is an invertible linear operator on R", and ij is a function from some 
neighborhood of the origin in R n to R", and lim Ttj[(x)] =0, then 

X +0 

lim i)(x) = 0. 

x 


17. Observe that the function / : R -» R defined by f(x) = x 3 is continuously differen- 
tiable everywhere and that the derivative at 0 is singular (i.e., not invertible). The function 
/ is nevertheless invertible — in every neighborhood of the origin, in fact. Construct 
functions from R" to R n exhibiting these properties. 

18. Show that it is possible to prove the inverse function theorem in the special case 
where n — 1 without assuming the function to be continuously differentiable. It is 
sufficient that it merely be differentiable and that its derivative be different from zero at 
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each point of an open interval (a, b). Show that one can then prove that the function has 
an inverse on (a, b), that its inverse function is differentiable at each point of its domain, 
and that the derivative of the inverse can be expressed very simply in terms of the 
derivative of the given function. Hint: Use Exercise 21 of §3.3. 

The authors do not know whether, in Theorem VIII, the hypothesis that £ be 
continuously differentiable can be replaced by the hypothesis that f merely have an 
invertible derivative throughout some neighborhood of x = a. This hint ought to come at the 
end of the preceding paragraph. 

19. Consider the function /:R->R defined as follows: 

f(x ) = x if x is rational, 

f(x) = x 2 + x if x is irrational. 

Prove that / is not only continuous at 0 but differentiable there with a positive derivative. 
Then show that / is not invertible in any neighborhood of 0, The fact that / has a positive 
derivative at 0 does imply that there is some neighborhood of 0 in which f(x) > /( 0) if 
x > 0, and f(x) < m if x <0, but there is no neighborhood of 0 throughout which / is 
either always increasing or always decreasing. 

20. Suppose that W is an open subset of R 5 and that (at, a 2 , 0, 0, 0) E W. Show that 
{(xi, x 2 ): (xi, Jt 2 , 0, 0, 0)E Wj is an open subset of R 2 . 

21. Suppose that W is an open subset of R 5 = R 2 x R 3 and that (a, b) E W. Prove that 
there are open sets °U and V in R 2 and R 3 respectively such that stE.% bEf, and 
°U x Y c W. Notice that this argument can be extended to prove that every open set in 
R p+q = R p x R q is the union of Cartesian products of open sets in R p with open sets in R q . 

How can one distinguish geometrically between those subsets of the plane which are 
Cartesian products of subsets of the line and those that are not? 

22. Write out in complete detail, with accompanying figures, a proof of Theorem X in 
the special case where p — q = 1. Draw a figure for the case which illustrates the 
possibility that the set S may consist of several disjoint sets (intervals). 

23. (a) Suppose there exists a function u-F(x , y), continuously differentiable in an 
open set in R 2 , with values in R, and invertible, so that F cannot have the same value at 

d F d F 

two different points. Show that this leads to a contradiction. Hint: Either -7— or — must 

dx dy 

be nonzero at some point. 

(b) Show that there cannot exist an invertible continuously differentiable mapping from 
an open subset of R 3 to R 2 . HINT: Use an implicit function argument to reduce the 
situation to one in which part (a) can be used. 

24. Let f(T) - T~ ] when Ted. [See §11.10 for definition of D and other relevant 
information, especially the lemma and the fact that O is an open set in j£(R").] Thus / is a 
function from D to itself. Prove that / is differentiable at each point of H and that its 
differential is given by df(T,H) = —T~'HT~ 1 for each H in i£(R n ). HINT: The essential 
tools are the lemma from §11.10 and the formula in Exercise 21, Chapter 11, for the case 
k = 1. Note that when ||H|| is sufficiently small 

(T + H)~' = [T(I + T 'H)] _1 = (J + r I H)“ , T“ 1 . 

Now use the result from Exercise 21, Chapter 11 with the T there replaced by — T l H. 

25. Using the method of steepest descent as presented in §12.21 as a guide, devise a 
method of steepest ascent and use it to solve Exercise 11 of §6.3. The calculations should 
be carried out on a programmable calculator. 
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26. Use Newton’s method to find to four places of decimals a number x such that 
log c x — e~ x = 0. This does not require a programmable calculator. Any pocket calculator 
with a few memory locations makes it possible to generate the sequence of x„’s defined by 
(12.3-1) without writing down any intermediate results. A good starting approximation can 
be got by sketching the graphs of e~ x and log e x and estimating the abscissa of their point of 
intersection. 

27. Using a programmable calculator, solve the nonlinear system (12.3-3) by New- 
ton’s method. 

28. Complete the derivation of (12.8-1). 

29. Let points Pi(xj, y0, . . . , P m (x m , y m ) be given in xy-plane, not all on the same 
vertical line. They may or may not lie on some straight line, but sometimes it is important 
to find a straight line y = aix + a 2 that “fits” the points as accurately as possible in the 
sense that the sum of the squares 

«ai, ct 2 ) = (PdQO 2 + (KQ 2) 2 + • • ■ + {KXLf 

is as small as possible, where Q, is the point in which the line x = Xj intersects the line 
y = aix + a 2 . See Fig. 89. 

Show that the problem of finding a, and a 2 to make <f>(a u a 2 ) a minimum is the same as 
the problem of finding the least squares solution of the system of m equations 

Xiai + a 2 = yi 

x 2 ot 1 + a .2 = y 2 


XmOl 1 T Oi 2 y m 

in the two unknowns at and a 2 . In this case the b of (12.8-7) is (yi, . . . , y m ), the x of 
(12.9-7) is («i, a 2 ) and A is a matrix of m rows and two columns. What is the matrix A 7 A 
in this case, and how do you know that it is nonsingular? 

30. Fill in the details of the first argument discussed for showing that the value of x 
given by (12.8-9) is that one for which <fr has its absolute minimum value. 



13 / DOUBLE AND 
TRIPLE INTEGRALS 


13 / PRELIMINARY REMARKS 

This chapter is designed in such a way that it does not require of the student any 
previous knowledge of the subject of multiple integrals. Most students of 
elementary calculus will have had some acquaintance with double and triple 
integrals before coming to a course in advanced calculus. The extent of this 
acquaintance will vary considerably with the student, however, and it seems 
desirable here to start from the beginning. 

Multiple integrals have important applications in geometry and the sciences. 
For these applications the concept of the integral (double or triple) is vital, quite 
apart from the important matter of knowing how to calculate the value of the 
integral. It is therefore very important for the student to pay attention to the 
definitions of double and triple integrals. 

Naturally, the study of multiple integrals builds upon prior knowledge of 
ordinary definite integrals 

f /(*) dx. 

J a 

Such integrals may be called single integrals, since they involve functions of a 
single independent variable, while multiple integrals involve functions of several 
independent variables. 

In the study of integration, some of the salient matters to be considered are: 
the definition of the integral, the type of function which is integrable (i.e., which 
has an integral in the sense of the definition), properties of the integral, methods 
of finding the values of integrals, and applications. Of these matters, the question 
as to what types of functions are integrable is the most difficult. For most 
ordinary applications it is sufficient to deal with continuous functions. In Chapter 
18 , we shall consider the theory of integration, paying special attention to 
questions of integrability; in particular, we shall prove in that later chapter that 
continuous functions are integrable. In the present chapter we shall avoid 
discussions of integrability, taking for granted the existence of the integrals 
which come to hand. 

13.1 / MOTIVATIONS 

Consider a thin plane sheet of metal covering a certain region R in the xy-plane. 
If the sheet is of uniform thickness and texture, the mass of any portion of the 
sheet will be directly proportional to the area of that portion. The constant of 
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proportionality may be called the areal density of the sheet; it is the mass per 
unit area. If we denote this density by er, the mass AM of an area A A of the 
sheet is 


AM = (rAA. (13.1-1) 

How can we locate the center of mass (which is the same as the center of 
gravity) of the sheet? An intuitive attack on this 
question may be made as follows. Divide the sheet 
into a large number, say n, of small pieces. Denote the 
areas of these pieces by A A,, . . . , A A„, and their mas- 
ses by AM,, . . . , AM„. If the maximum dimensions of 
the pieces are all sufficiently small, it is a reasonable 
approximation to regard each piece as a particle, all 
of the mass of the piece being thought of as con- 
centrated at some point within the piece. We thus 
arrive at the picture of a system of particles, the 
mass AM fc being located at a point P k (x k , y k ) in the 
kth piece, as in Fig. 90. The center of mass of this finite system of particles is at the 
point (x, y), where 

Mx = x, A M, 4- • • • + x n A M„, 

My = y, AM, + • • * + y n AM n , (13.1-2) 

M - AM, + • * • + AM„. 

It is plausible to suppose that, as the number of pieces is increased and the 
greatest dimension of all the pieces approaches zero, the point (x, y) located by 
formulas (13.1-2) will approach a limiting position which is the exact center of 
mass of the plate. If this limiting point has co-ordinates (x, y), we see from 
(13.1-1) and (13.1-2) that 

Mx = lim(x,<7 A A, + ■ • • + x n cr A A n ), (13.1-3) 

with a similar formula for y. The limit here is to be understood in much the same 
way as the limit defining a definite single integral (see §1.63). In the present case 
the limit is a double integral, and we write 

Mx = J J xcrdA. (13.1-4) 

R 

The expression on the right here is called the double integral of xcr over the 
region R. There will be a similar formula for y. 

We shall see that double integrals can arise conceptually from many 
different geometrical or physical problems. For the present we select one further 
illustration, this time from geometry. 

Consider a surface z = /(x, y), where / is defined and continuous in a 
bounded closed region R of the xy-plane. Suppose that the values of / are 


V 
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everywhere positive, or perhaps zero, so that the surface never falls below the 
xy-plane. We now pose the problem: How can we calculate the volume which is 
under the surface and directly above the region R ? This volume will be bounded 
laterally by a cylindrical surface composed of lines parallel to the z-axis erected 
at the points of the boundary of R. 

The procedure for arriving at a formulation of the volume is very similar to 
the procedure used in expressing the area under a curve y = f(x) from x = a to 
x - b as the definite integral fa f(x) dx. We divide the region R into a large 
number (say n) of small pieces (called subregions) of areas A A u . . . , A A n . This 
divides the volume under consideration into thin columns. Let the portion of the 
volume directly above the small area A A k be denoted by AV k , so that the total 
volume is 


V = AV, + - • • + A V„. 

If P k (x k , y k ) is any point in the kth subregion of 
R , let z k = /( Xk, yk) be the distance from P k up to 
the surface (see Fig. 91). If the dimensions of 
A A k are sufficiently small, the expression 

z k A A k = f(x k , y k ) A A k 

is a good approximation to the volume AV k . In 
fact, denoting by m k and M k the minimum and 
maximum values, respectively, of f(x , y) in the 
kth subregion, we have 

m k A A k ^ A Vk ^ M k ^ k . (13.1-5) 

Since z k lies between m k and M k , it is clear that 
AVk and z k A A k do not differ by more than (M k - m k ) A A k . It thus appears very 
plausible that the sum 

f(x i,yi)AAi + * • • + /(*„, y n ) AA n (13.1-6) 

is a good approximation to V, and that we have exactly (using a summation 
symbol to abbreviate the expression (13.1-6)), 

V = lim 2 f(Xh, yk) A Ak, (13.1-7) 

fc-i 

the limit being understood in the sense that the number n is increased 
indefinitely and the maximum dimension of the subregions AAj, . . . , A A n ap- 
proaches zero. The choice of the point P k in the kth subregion is arbitrary, and 
the exact shape of the subregions is immaterial. The limit on the right in (13.1-7) 
is equal to a double integral; the notation for the integral is 

V = 1 1 f(x, y) dA. 

R 
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If we compare the limits in (13.1-3) and (13.1-7), we see that they have the 
same form. In fact, the limit in (13.1-3) is the special case of that in (13.1-7) for 
which /(x, y) = xcr (which happens to be independent of y). Limits of sums 
having the general form (13.1-6) occur in a variety of contexts, with widely 
different interpretations. The mathematical properties common to all such limits 
furnish us with a starting point for the general theory of double integrals. 


13.2 / DEFINITION OF A DOUBLE INTEGRAL 


Let R be a closed, bounded region in the xy-plane, and let f(x , y) be a function 
defined in R . In a very general theory of integration, we might seek to place no 
more restrictions on the function / and the region R than are absolutely 
necessary for the development of the theory. In the interests of simplicity, 
however, we shall make rather severe limitations on R, and we shall assume at 
the outset that the function / is continuous in R. Later it will be possible (and 
desirable) to broaden the treatment so that certain kinds of discontinuities of f 
are permitted. 

The term “region” was defined in §5.1. We are now concerned with closed, 
bounded regions. If R is such a region, it has an interior and a boundary. Since R 
is closed, the boundary is part of the region. The limitations we place on R are in 
the nature of assumptions about the character of the boundary. We have in 
mind, roughly speaking, that the boundary of R shall consist of a finite number 
or arcs of smooth curves joined together to form a closed curve, or possibly 
several (but a finite number of) such curves. A smooth curve is defined to be a 
curve with a continuously turning tangent. Circles, parabolas, and straight lines 
are among the simplest kinds of smooth curves. It is more difficult than one 
might suppose to be precise in describing the boundary of a region; we shall not 
attempt to express our assumptions more exactly than in the above statement. 
Hereafter in this chapter, in speaking of a region R in connection with a double 
integral, the foregoing assumptions will be taken for granted without explicit 
mention. 


In defining a double integral, we start from 
approximating sums having the appearance of 
(13.1-6), but the subregions AA k are chosen in 
a prescribed manner, and are not arbitrary in 
shape. Let two sets of lines be drawn, one set 
parallel to the x-axis, the other set parallel to 
the y-axis (see Fig. 92). The spacing of the lines 
need not be regular, but the spacing should be 
close enough so that the rectangles formed by the 
intersections of the two sets of lines are small in 
comparison with R. The network thus formed in 
the xy-plane is called a rectangular partition; one 
of the rectangles of the network is called a cell. 
Some of the cells will belong entirely to R ; others 


V 



Fig. 92. 


will contain points which do 
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not belong to R. For our purposes we discard these latter cells, retaining only those 
which do not go outside R (these are shaded in Fig. 92). Let the retained cells be 
numbered (in any order), and denote their areas by A A u . . . , A A N , N being the 
number of cells retained. It is convenient to refer to the kth cell as the cell A A k . 
Now let P k , with co-ordinates (jt k , y k ), be any point in the cell A A kj and form the 
sum 


f(X], y t ) A A, + • • • + f(x N, y N ) AA n = f(x k , y k ) A A k . (13.2-1) 

The limit of this sum is defined as the double integral of the function / over the 
region R : 


J J /(*> y) dA = lim 2 f(x k , y k ) A A k . (13.2-2) 

R 

The integral is the limit of the sum in the following sense: The integral is the real 
number to which we may approximate as closely as we please by the sums 
(13.2-1); to get any desired degree of approximation all that is necessary is to 
make the dimensions of all the relevant cells sufficiently small. The choice of P k 
in A A k is arbitrary, and the rectangular partition itself is arbitrary. In choosing 
partitions so that all the cells have very small dimensions, it is of course 
apparent that N will become very large. 

The precise meaning of (13.2-2) is then as follows: If e is any positive 
number, there is some corresponding positive number 8 which depends upon e 
(and also upon / and R) such that the inequality 


j j /(*, v) dA - 2 f(x k , yk) A A k 

R 


<€ 


(13.2-3) 


holds for all rectangular partitions in which the maximum cell dimensions are 
less than 5, and for all choices of the point P k in the kth cell. 

We take it for granted that the continuous 
function f is integrable, i.e., that the approximat- 
ing sums (13.2-1) do actually converge to a limit in 
the sense just specified. This assumption is 
examined more closely in Chapter 18. 

The student may already have speculated 
upon the fact that in forming the sum (13.2-1) 
we dealt only with those cells which belong en- 
tirely to R. We might have proceeded some- 
what differently and retained not only those cells 
just mentioned, but also all those cells which Fig. 93. 
touch the region R in any way (see Fig. 93). 

This procedure would give us more terms. The limiting value of the approximat- 
ing sums would be the same as in (13.2-2), however. For, let AS denote the area 


y 
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of the additional cells, and let M be the maximum of |/(x, y)| in R . Then the 
contribution of these additional cells to the approximating sum would not exceed 
M AS. But AS approaches zero as the mesh of the partition is made finer and the 
maximum cell dimension approaches zero. Hence M ASh> 0. In a fully detailed 
treatment of these matters the proof that AS-»0 turns out to depend on the 
nature of the boundary of R. The assumptions we made concerning this 
boundary are sufficient to insure that AS^O. 

The notation 


J J f(x, y) dx dy instead of J J f(x, y) dA (13.2-4) 

R R 

is frequently used for the double integral. The letter which is used for area is of 
course immaterial. Thus, if we used S for the area of R , and ASj, . . . , A S N for 
the areas of the cells in the partition, we might denote the double integral by 

f J fix, y) dS. 

R 

13.21 / SOME PROPERTIES OF THE DOUBLE INTEGRAL 

A number of important properties of double integrals follow readily from the 
definition (13.2-2). Two of these properties are embodied in the formulas 

1 f cf(x, y)dA = cJ j Six, y) dA, (13.21-1) 

R R 

f j lf(x, y) + g(x, y )] dA = Jj fix, y)dA + jf gix, y) dA. (13.21-2) 

R R R 

In (13.21-1) c is a constant; a constant factor may be taken 
outside the integral sign. In (13.21-2) / and g are any two 
functions which are integrable in R ; the integral of a sum is 
the sum of the integrals. These formulas are immediate 
consequences of the definition (13.2-2) and the fundamen- 
tal theorems about limits (see §1.6). 

Another important property concerns the situation Fig . 94. 

when the region R is composed of two regions R\,R 2 

which have no common points except for parts of their boundaries (see Fig. 94). 
The formula here is 



JJ fix, y)dA = JJ fix, y)dA + JJ fix, y) dA. 

R R\ R 2 


(13.21-3) 
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The subregions R\, R 2 are of course subject to the same assumptions as R as far 
as their boundaries are concerned. 

13.22 / INEQUALITIES. THE MEAN-VALUE THEOREM 

It is at once apparent from the definition of the double integral that 

JJ f(x,y)dAmO if f(x, y) g 0 in R. (13.22-1) 

R 

Hence, if /( x, y) ^ g(x y) in R , we have 

J J fix, y )dAsJJ g(x, y) dA. 

R R 

Now 

- |/(x, y)| S fix, y) S |/(x, y)|, 

and therefore 

- JJ l/(x, y)| dA = ff fix, y ) dA = ff I fix, y)| dA. 

R R R 

This result can be written 

J f f(x, y)dAjsff \f(x, y)| dA. (13.22-2) 

R R 

Let A be the area of R. Then, taking f(x, y) = 1, we see that 

J J f(x , y) dA = lim 2 AA k = A. 

R 

Hence, for any constant c, 

JJ cdA = cA. (13.22-3) 

R 

Suppose now (returning to the case of an arbitrary continuous /) that m, M 
are numbers such that, in R, 

m ^ f(x, y) ^ M. 

mA = JJ mdA^ j j f(x, y) dA ^ J J M dA = MA, 

R R R 


Then 
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so that the integral of f has a value between mA and MA. Accordingly, we have 
the following theorem: 

THEOREM I (MEAN-VALUE THEOREM). If m and M are the minimum and 
maximum values of /(x, y) in R, there is a number pL such that m ^ fi ^ M 
and 


f f f(x, y) dA = fxA. (13.22-4) 

R 

The number pi is called the average (or mean) value of / in R. We cannot as 
a rule find the value of pi unless we know the value of the integral, so that pt is in 
fact defined by (13.22-4). Nevertheless, even without exact knowledge of the 
value of pi, the fact that m ^ pt ^ M makes the formula (13.22-4) useful. 

On occasion it is useful to know that there is some point P(x, y) in R at 
which f takes on its average value pi. When / is continuous, there is always at 
least one such point P provided R is what is called a connected region, i.e., is 
not composed of two or more closed regions completely separated from each 
other. For a fuller discussion of what is meant by saying that a region is 
connected, the student is referred to §17.7. This matter need not be considered 
any further at present, however. 

13.23 / A FUNDAMENTAL THEOREM 

In our motivation of the definition of the double integral we used approximating 
sums which were obtained by decomposing the region R into subregions of 
arbitrary shape. In the definition (13.2-2), however, we restricted ourselves to 
subregions which are cells of a rectangular partition. It is important to know that 
we get the same limit of the approximating sums, no matter how the subdivision 
of R is made (as long as the pieces are sufficiently regular in shape that we can 
without ambiguity assign each of them an area). Assurance of this is given by the 
following theorem: 

THEOREM II. Let the double integral of the continuous function f over R be 
defined by (13.2-2), using rectangular partitions. Let R be divided in any 
manner into a finite number of subregions , of areas A Aj, . . . , AA n , the 
shapes being arbitrary except as qualified in the previous paragraph. Let a 
point P k (x k , y k ) be chosen arbitrarily in A A k . Then 

j J f(x ; y) dA = lim 2 f(*k, yk) A A k . 

R 

Furthermore , it is not essential that the areas AAi, . . . , AA„ completely fill 
out the area of R, provided that the amount of area omitted approaches zero 
in the limiting process. 
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This theorem appears to be intuitively evident from the geometrical inter- 
pretation of the double integral as a volume, as explained in §13.1. A purely 
analytical proof may be given. In this proof the property (13.21-3) plays an 
important role. We forego the details. 

Among other things, this theorem has the consequence that we are able to 
define the double integral of a continuous scalar point function over a region P; 
the integral is independent of co-ordinate systems, and is therefore a scalar 
invariant. Before reading the following brief remarks on this subject, the student 
will do well to read the first part of §10.5. 

Let R be a plane region of the type assumed in §13.2, and let /(P) be a 
continuous scalar point function defined in R. With an arbitrary choice of 
rectangular co-ordinates in the plane, let the representation of f(P ) be 

f(P) = F(x , y), 

P having co-ordinates (x, y). Consider the integral 


JjF(x,y)dA, (13.23-1) 

R 

as defined earlier in this chapter. If some other rectangular co-ordinate system is 
set up in the plane, denote the new co-ordinates of P by (x', y'), and the new 
representation of /(P) by 0(x', y'). Then the integral 

1 1 <P(x', y') dA (13.23-2) 

JR 

has the same value as (13.23-1); for, the approximating sums converging to the 
integral (13.23-2), formed for a rectangular partition of the x'y '-coordinate 
system, will also converge to the integral (13.23-1), by virtue of Theorem II, 
since F(x, y) = <I>(x', y') when (x, y) and (x\ y') refer to the same point. It follows 
that if we define the double integral of f(P) over R by 

jj f(P ) dA = ff F(x, y) dA, (13.23-3) 

R R 

then the integral is a scalar invariant. 


13.3 / ITERATED INTEGRALS. CENTROIDS 

We shall now learn how to calculate the value of a double integral by performing 
two successive single integrations. Our initial explanation of this method rests on 
the geometric interpretation of the double integral as a volume, as in the 
discussion which culminates in formula (13.1-8). 
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Our problem is to evaluate the integral 


j I /(*> y) dA, (13.3-1) 

R 

where / is continuous in R. We shall for convenience assume that f(x, y ) is 
positive at each point of R. The integral (13.3-1) is equal to the volume under the 
surface z = f(x , y) and directly above the region R. But we can calculate this 
volume in the following manner: 

Let the region R be contained between the lines y = a, y = b in the xy-plane 
(see Fig. 96), and suppose that an arbitrary line y = y', where a<y'<b, 
intersects the boundary of R just twice, as shown in Fig. 96. The co-ordinates 
Xj, X 2 of these intersections will depend on y'. 


z 



V 



Fig. 96. 


Turning now to the three-dimensional picture (Fig. 95), consider the inter- 
section of the plane y = y' with the volume we are seeking. Let the area of this 
plane section of the volume be denoted by S(y'). This area may be expressed as 
an integral with respect to x, y' being held constant: 

S(y') = [ Xl f(x,y')dx. (13.3-2) 

Jx, 

If Ay is a small positive number, the parallel plane y = y' + Ay will also intersect 

the solid, and the volume of the slice between these two planes will be 

approximately S(y') Ay. Hence we should expect the total volume in question to 
be 

V = [ S( y) dy. (13.3-3) 

Ja 

This is precisely the method which is followed in elementary calculus for finding 



386 


DOUBLE AND TRIPLE INTEGRALS 


Ch. 13 


the volumes of various kinds of solids, notably solids of revolution, pyramids, 
and so on. If we now drop the prime on y in (13.3-2), we see from (13.3-3) that 
the desired volume is 


v = l (Jx ^ x,y ^ dx ) dy ' 

It is usual to write this expression in the form 

V= f dy [ X2 f(x,y)dx. (13.3-1) 

We have here what is called an integrated integral. Two successive integrations 
are indicated. First we integrate with respect to jc, holding y constant in the 
integrand. The limits of integration Xi, X 2 generally depend on y, and are found 
by referring to Fig. 96 and taking account of the equations of the curves which 
form the boundary of R. The second integration is then performed with respect 
to y, between the limits y = a, y = b ; these limits are the algebraically smallest 
and largest values, respectively, which y can assume in the region R. 

We now have two expressions for the volume under the surface z = f(x, y), 
one given by a double integral, and the other by an iterated integral. Therefore, 
we can state a theorem. 

THEOREM III. Let the region R have its extremes in the direction of the y -axis 
at y = a and y = b respectively (a < b). Let any line between these extremes 
and parallel to the x-axis cut the boundary of R in exactly two points , so that 
for a <y <b the boundary of R is formed by two curves 

x = Xi(y), x = X 2 (y) (X,<X 2 ). 

Then a double integral over R can be expressed as an iterated integral : 

f f f(x, y) dA = f* dy j* 2 f(x, y) dx. (13.3-5) 

R ' 

It is of course also true that the double integral can 
be expressed as an iterated integral in which the first 
integration is with respect to y, provided the ap- 
propriate conditions on the boundary of R are fulfilled. 

See Fig. 97 and formula (13.3-6). 


J J fix, y) dA = J dx f fix, y) dy. (13.3-6) 

R 

We have been led to the discovery of Theorem III Fig. 97. 
by an argument based on geometrical considerations 
which are highly plausible. The theorem itself is not 

dependent on the geometrical interpretation, however. Nor is the assumption 


y=Y 2 (x ) 



y = Yi{x) 
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that f(x, y) is positive an essential one. The purely analytical proof of the 
theorem is discussed in Chapter 18, §18.61. 

Example 1. Let the region R lie in the first quadrant of the xy -plane, and be 
bounded by y = 0, y 2 = x, and x + 2y = 3. Let the surface be the plane 3x + 4y + 
2 2 = 12. Sketch the solid and find the volume for this particular instance of the 
foregoing discussion. 


z 




The region R is shown in Fig. 98. The intersection of the parabola and line is 
at (1, 1). The solid in question is shown in Fig. 99. The top surface is 

z =|(12 — 3x — 4y). 

Hence the volume is 


V = 2 JJ (12 — 3x — 4y) dA. 

R 

To express this as an iterated integral we read from Fig. 98 that a line parallel to 
the x-axis cuts the boundary of R at 

x = y 2 and x = 3 - 2y. 

Thus X\ = y 2 and X 2 =3-2y in this case. The extreme values of y are y = 0, 
y = 1. Hence 


V = 



(12- 3x —4ydx. 
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The first integration yields 

[12x - \x 2 - 4xy]™ = 36 - 24y - 1(9 - 12y + 4y 2 ) - 12y + 8y 2 
-(12y 2 -|y 4 -4y 3 ) 

= 18y - 10y 2 +4y 3 + |y 4 . 

Hence 

V = j f (¥-18y-10y 2 + 4y 3 + §y 4 )dy 

Jo 

= j[fy -9y 2 -fy 3 + y 4 +ray 5 ]' 

= Jf?-9-¥+l+4l = fi, 
or 

V = 5.733 .... 

Example 2 . Express the double integral of 
Example 1 as an iterated integral in which the first 
integration is with respect to y. 

When the integration is done in the order 
requested, we observe that the upper limit Y 2 of the 
formula (13.3-6) has two different expressions, 
according as 0 ^ x ^ 1 or 1 ^ x ^ 3 (see Fig. 100). 

Y 2 = Vx, 0 ^ x ^ 1 

V 2 = k 3-x), l^x^3. 

Under these conditions the iterated integral must be written as the sum of two 
iterated integrals: 

fix, y) dA= I dx\ f(x, y) dy + dx \ /(x, y) dy. 

Jo Jo J 1 Jo 

R 

In the present case /(x, y) = |(12 - 3x — 4y). 

We leave it for the student to complete the integrations and check with the 
result of Example 1. 

The exercises following this section are intended to give the student practice 
in evaluating double integrals by means of iterated integrals. In these exercises 
the applications are limited to the calculation of volumes and the location of 
centers of gravity of thin sheets of constant areal density. The use of double 
integrals for this purpose was explained in §13.1. If the thin sheet (often called a 
lamina) is specified as to shape and position by a region R in the xy-plane, and if 
the constant areal density is cr, the co-ordinates (x, y) of its center of gravity are 
given by 

Mx = jjxadA, My = J J ycr dA, (13.3-7) 

R r 



y 



Fig . 100. 
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M being the mass of the sheet. If A is the area of R, we have M = <tA. Dividing 
both sides of the formulas (13.3-7) by the constant factor o, we have 


Ax = JJ xdA, Ay = JJ y dA. (13.3-8) 

R R 

It thus appears that x and y are independent of a. The point (x, y) is thus a 
geometric characteristic of the region R, and is not affected by the material 
composing the sheet. This point is often called the centroid of R. It will be 
observed (compare Theorem I, §13.22) that x is the mean value of x in R; a 
similar statement holds for y. 


EXERCISES 

1. Compute the volume of each of the following regions by the use of a double 
integral. All literal constants a, b, c. etc. are assumed to be positive. 

(a) The tetrahedron cut from the first octant by the plane 3x -1- 4y + 2z = 12. 

(b) The first octant section cut from the region inside the cylinder x 2 +z 2 =a 2 by the 
planes z = 0, y = 0, x = y. 

(c) The hemisphere x 2 ‘+ y 2 + z 2 ^ a 2 , z s 0. 

(d) The region between the paraboloid a 2 z = H(a 2 ~x 2 - y 2 ) and the xy-plane. 

(e) The region bounded by the ellipsoid (x 2 /a 2 ) -I- (y 2 /b 2 ) + (z 2 /c 2 ) = 1. 

(f) The first octant portion of the region inside the cone a 2 y 2 = b 2 (x 2 + z 2 ) and between 
y = 0 and y = h. 

(g) The first octant region bounded by the co-ordinate planes and the cylinders a 2 y = 
b(a 2 -x 2 ), a 2 z = c(a 2 ~x 2 ). 

2. Find by double integrals the volumes of the tetrahedrons described: 

(a) With plane faces x = 0, z = 0, x + y = 5, 8x - 12y + 15z = 0; 

(b) With vertices (0, 0, 0), (3, 0, 0), (2, 1 , 0), (3, 0, 4); 

(c) Cut from the first octant by the plane (xja) + (y/b) + (z/c) = 1. 

3. Interpret the iterated integral 

fV^^ 2x + 4y 

dy i ~1- dX 

as the volume of a certain solid, and describe the solid geometrically. Calculate the 
volume. 

4. Locate the centroids of the following plane regions, using double integrals. All 
literal constants are assumed to be positive. 

(a) The triangle with vertices at (0,0), (a, 0), (a, b). 

(b) The triangle with vertices at (0,0), (a, 0), (b, c), where a > b. 

(c) The semicircular region x 2 +y 2 Sfl 2 , x ^0. 

(d) The region in the first quadrant and inside the ellipse (x 2 la 2 ) + (y 2 /b 2 ) = 1. 

(e) The first quadrant region bounded by By 2 = H 2 x , x = B, y = 0. 

(f) The region between the curve x 3 = y 2 and the line x = 1. 

(g) The region bounded by bx 2 = a 2 y and the line ay = bx. 
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5. Locate the centroid of the region described by x n ^ y ^ 1, 0 ^ x ^ 1. What is the 
limiting position of the centroid as n -* °°? 

6. Solve Exercise 1(b) by use of a double integral over a suitable region in the 
xz-plane. 

7. Interpret the integral 



as a double integral arising in the location of the centroid of a certain plane region, and 
hence write down the value of the integral without actually carrying out the integration. 

8. Locate the centroids of each of the following plane regions: 

(a) The region in the first quadrant between x = 0, x = 1, and between y = x - x 2 , y 2 = 2x. 

(b) The region bounded by the two parabolas y = x 2 -f x, y = lx 2 — 2. 

(c) The region defined by x 2 ^ y ^ 2 - x, O^x^l. 

(d) The region bounded by y = 0, x + y = 2, and the first quadrant part of y = x 2 . 


13.4 / USE OF POLAR CO-ORDINATES 

In suitable situations the evaluation of double integrals is greatly simplified by 
the use of polar co-ordinates. We shall explain the details of such use. 

Suppose we wish to find the value of a double integral 

J j f(x, y ) dA. (13.4-1) 

R 

If we make the change to polar co-ordinates 

x = r cos 0, y = r sin 0 , 

the function /( jc, y) becomes a function of r and 6 , say 

f(x, y) = F(r, 6). (13.4-2) 

We are going to appeal to Theorem II (§13.23). Instead of decomposing R by a 
rectangular partition, we make a subdivision based 
on a series of circles r = constant and a series of 
rays 6 = constant. These two series of curves are 
the two one-parameter families associated with the 
curvilinear co-ordinates r, 6 (see §9.4). We con- 
sider the cells of this subdivision which belong 
entirely to the region R, and number them con- 
secutively in any order (see Fig. 101). Suppose that 
there are n cells and that A Ak is the area of the kth 
cell. According to Theorem II, referred to above, 

JJ fix, y) dA = lim 2 f(x k , y k ) A A k , (13.4-3) 

R 



o) 

Fig. 101. 


X 
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where (x k , y k ) is a point which may be chosen arbitrarily 
in the fcth cell. Let us agree to choose the point midway 
between the two circular arcs bounding the cell, and also 
midway between the two rays bounding the cell (see Fig. 
102). If the radii of the circular arcs differ by A r k , and if 
the rays make an angle A 6 k with each other, the area A A k 
is easily computed in terms of A 0 k , A r k and the polar 
co-ordinates (r k , 0 fc ) of the chosen point. The two circles 
are respectively 

r = r k -jAr h r = r k + \Ar k . 


V 



Using the formula for the area of a sector of a circle, we have 
A A k = |[( r k + r k ) 2 -(r k -^A r k f] A 0 k , 


or 


AAk = r k A r k A6 k . 

Hence in view of (13.4-2), (13.4-3) may be rewritten 


J j f(x, y ) dA = lim 2 F(r k , 0 k )r k A r k A 6 k . (13.4-4) 

R 

Our next task is to work out a method of calculating the right member of 
(13.4-4). We do this with the aid of the concept of a point transformation, or 
mapping, from one plane to another, as developed in §9.2. We regard (r, 0) as 
rectangular co-ordinates in one plane, and (x, y) as rectangular co-ordinates in 
another plane. The equations x = r cos 0, y = r sin 0 define a mapping from one 
plane to the other. Under this mapping, each of the cells of the subdivision of R 
shown in Fig. 101 corresponds to a rectangular cell in the rO -plane. The region R 
itself maps into a certain region in the r0-plane. Let us denote this region by T. 
Corresponding to the partition of R by the cells of the polar co-ordinate net, we 
have a rectangular partition of T. We may consider the function F(r, 6) as being 
defined in T, its value at (r, 0) being the same as the value of /( jc, y) at the 
corresponding point of R. If we number the cells in T in correspondence with 
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their counterparts in R let AS k be the area of the /cth cell (see Fig. 103 and Fig. 
104) then, in accordance with the notation of Fig. 102 the dimensions of this /cth 
cell are A r k by A0 k , so that AS k = Ar k A0 k . Accordingly, we see that the limit on 
the right in (13.4-4) is precisely the limit which defines the double integral 

jj F(r,d)rdS 

T 

of the function F(r, 0)r over the region T in the r0-plane. In the notation of 
(13.2-4) we may write 


j j f(x , y) dxdy = j j F(r, 0)rdrd6. (13.4-5) 

R T 

Observe the factor r in the integrand on the right . This formula is the fundamen- 
tal result we have been seeking. The last step is to express the integral on the 
right as an iterated integral with respect to r and 0. The limits of integration will 
of course depend on the particular region T and may be found by the method 
explained in §13.3. In practice, however, these limits of integration are usually 
found directly by examining R. One must know the equations in polar co- 
ordinates for the curves forming the boundary of R. If we regard /(x, y) and 
F(r, 0) as defining the same scalar point function in R , we may write 


/ { f (x > y) dA and ff F(r,8)dA 

R R 

interchangeably. The use of polar co-ordinates in evaluating the double integral 
(13.4-1) is then summed up as follows: 


J J F(r, 6) dA = 

R 



F(r, 6)rdr 



dr ( F(r,6)rd6. 

J©! 


y y 0 * 0 * 




(13.4-6) 


Fig. 105a. 
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In these iterated integrals a, are the extreme values of 0 , and a, b are the 
extreme values of r, in the region R. The inner limits R i, R 2 , ©i, © 2 are read off 
from the appropriate one of the two figures, as shown (Fig. 105a or Fig. 105b). 

The use of polar co-ordinates may prove advantageous either by sim- 
plification of the integrand, or by simplification of the limits of integration in 
dealing with the iterated integrals. Experience and discernment are required to 
be able to judge whether or not to use polar co-ordinates. The student’s first task 
is to practice the use of polar co-ordinates. 


Example 1. Locate the centroid of the plane 
region R shown in Fig. 106 (above the x-axis and 
between the circles of radii a, b). 

The centroid is obviously on the y-axis, so 
x = 0. The area A of R is ( 7r/2)(b 2 - a 2 ). Hence 

j(b 2 -a 2 )y=JJ ydA. 

R 


y 



The boundaries of R have very simple equations 

in polar co-ordinates. Therefore, we evaluate the double integral by an iterated 
integral in polar co-ordinates. Here 


f(x, y) = y = r sin 0 = F(r, 0). 


Also, a = 0, = tt, R \ = a, R 2 = b. Therefore, supplying the extra factor r in the 

integrand, we have 


Then 


J j y dA~ J dd J r 2 sin 6 dr 

b — f sin 0 dO = i(b 3 - a 3 ). 
Jo 


4 b 3 ~ a 3 4 b l + ba + a 


y = 


3 rr b 2 - a 2 37 r b + a 


For a semicircular region we put a = 0. In this case y = ^ b. 

Example 2. Find the volume inside the cylinder x 2 + (y-a) 2 = a 2 and be- 
tween the plane z = 0 and the paraboloid 4 az = x 2 + y 2 . 

The volume in question is given by 


v= J7i ( * 2+y2)dA ’ 

R 

where R is the region in the xy-plane bounded by the circle x 2 + (y - a) 2 = a\ 
Half of the volume is shown in Fig. 107. Polar co-ordinates are convenient for 
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y 




this problem, because of both the form of the integrand and the equation of the 
boundary of R in polar co-ordinates. The polar equation of the boundary of R is 
r = 2a sin 0 (see Fig. 108). Hence we have 


i r r i [tt r 2a sin 0 

V = 4^II r2dA = 4^L dd Jo r ' dr - 


J ~tt r irl 2 

sin 4 6 d0 = 2a 3 \ sin 4 0 dO. 
o Jo 


V = 2a’ 31n - 


4-2 2 8 


a . 


In evaluating the last integral we have used one of the standard tabulated 
formulas for the definite integrals 


Tir/2 

sin" 0 dO, n = 1,2,.... 
Jo 


These formulas are very convenient, and the student should be familiar with 
them. 


EXERCISES 

1. Calculate the volumes of the solids here described, using double integrals and 
polar co-ordinates. 

(a) Inside the cylinder x 2 + y 2 = a 2 , between the planes z — 0, z = x, and in which x ^ 0. 

(b) Inside the sphere x 2 + y 2 + z 2 = a 2 . Deal with the first octant only, and use symmetry. 

(c) Inside the cylinder x 2 + y 2 = a 2 and between z = 0 and a 2 z = h(x 2 + y 2 ). 

(d) Between the cone c\z - h) 2 = h 2 (x 2 + y 2 ) and the plane z = 0. 

(e) Inside both the sphere x 2 + y 2 + z 2 = 4 a 2 and the cylinder x 2 + (y - a ) 2 = a 2 . 

(f) Inside the cylinder x 2 + y 2 = 2ux and between the plane z = 0 and the cone z 2 = 
x 2 + y 2 . 

(g) Inside the prism bounded by the planes y = x, y = 0, x = aj Vl, and between the plane 
z = 0 and the cone az = h(x 2 + y 2 ) ,/2 . 
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2. Locate the centroids of the plane regions described as follows, using double 
integrals and polar co-ordinates: 

(a) In the first quadrant, between x 2 + y 2 = lax and y = 0. 

(b) Between r = 2a cos 0 and r cos 0 = a, and on the side of the latter curve away from 
the origin. 

(c) Inside the cardioid r = a{ \ + sin 0). 

(d) Inside the first quadrant loop of r = a sin 20. 

(e) In the first quadrant, inside r = 2a cos 0 and outside r = a. 

(f) Inside the loop of r 2 = 2 a cos 20 which is bisected by the ray 0 = 0. 

3. Find each of the two volumes into which the volume in Exercise 1(e) is divided by 
the cylinder x 2 + y 2 = a 2 . 

4. Find the volume inside the sphere x 2 + (y - a) 2 -f- z 2 = a 2 and between the planes 
x = 0, y = x. 

5. Find the volume between the paraboloid z = x 2 + y 2 and the plane z = x. 


13.5 / APPLICATIONS OF DOUBLE INTEGRALS 

In §13.1 we introduced the concept of a thin sheet of material substance. The 
concept of a distribution of matter without thickness is a very useful one. A 
plane region which carries such a mass distribution is called a lamina. A lamina 
is a mathematical idealization of a thin sheet, just as a particle is a mathematical 
idealization of a small, concentrated bit of matter. One may also speak of 
laminas which are curved surfaces, but here we shall deal only with plane 
laminas. 

We wish to introduce the concept of a lamina of variable density. In the case 
of constant density, the density of lamina is the ratio of mass to area: 



But we may imagine a lamina in which the mass is so distributed that various 
pieces of the lamina, although of equal area, will have different masses. For the 
general case, the density of a lamina is, by definition, an integrable function 
a(x, y) such that when it is integrated over any subregion A R of the lamina, it 
gives the mass of that portion: 


AM = J J crdA. (13.5-2) 

AjR 

In particular, the total mass is 

M=fJcrdA. (13.5-3) 

R 

We shall consider only the case of continuous densities. If the area of A R is 
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A A we see from (13.5-2) by the mean-value theorem (§13.2) that 


A A 


(13.5-4) 


where cr' and cr" are respectively the minimum and maximum values of the 
density in A R. Hence, if (x, y) is a fixed point of A£, and if we shrink the 
subregion A R so that its maximum diameter approaches zero, we see by the 
continuity of cr that 


r AM , , 

<r(x, y). 


(13.5-5) 


This relation replaces (13.5-1) when we deal with laminas of variable density. 

Example 1 . The density of a square lamina of side b varies in direct proportion 
to the distance from a particular corner of the 
square. Find the mass of the lamina. v 

Let us take the square as shown in Fig. 109 with 
the particular corner at the origin. 

We are given that 


y 1 . 


c r = k \/ X 2 + 

k being a constant of proportionality. Hence, 


M = k JJ Vx 2 + y* dx dy, 



R being the square region. We meet difficulties if we proceed to the iterated 
integral in rectangular co-ordinates. It is better to use polar co-ordinates. The 
line y = x divides the square into two triangles which are evidently of equal 
mass. The line x = b has the polar equation r = b sec 0. Hence, if R] denotes the 
triangle below the line y = x, 


M 


r r r ir/4 r b sec 0 

= 2k J J r dA = 2k J dO J r 2 dr. 


R i 

2/ch 3 r /4 3 A, A 

M = — r— sec J 6 dd 
3 Jo 

2kb 3 fi niti ♦ (v ^0\y 14 

= -y- 1 2 tan 6 sec 6 + 2 logtan(^— + -jj^ 

M = ■— |^V2 + log tan j. 

Many physical concepts are first formulated for systems of a finite number 
of discrete particles, and then extended to continuous distributions of matter by 
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the use of integrals. The guiding principle is that of subdividing the continuous 
distribution into small parts. Each part is then replaced by a particle, which is 
obtained by concentrating all the mass of the part at some point within the part. 
The resulting system of particles is then regarded as an approximation to the 
continuous distribution, and physical attributes of the continuous mass are 
assumed to be obtained as limits of the corresponding physical attributes of the 
approximating system of particles. Where the physical attribute of the system of 
particles is expressed by a sum, that of the continuous mass will be expressed as 
the limit of a sum, and this limit will normally be a definite integral. Illustrations 
are furnished by such concepts as center of mass, moment of inertia, and 
gravitational or electrostatic attraction. In the case of a lamina of continuous 
density, when it is subdivided into small parts, of areas AA U . . . , A A n , the mass 
A Mfc of the kth part will be expressed by 

AM fc = a(x k , y k ) AA k , (13.5-6) 

where (x ky y k ) is a suitably chosen point in the part [see (13.5-4) and the remarks 
at the end of §13.22]. 

Using the general principle described in the foregoing paragraph we find that 
the center of mass (center of gravity) of a lamina is given by formulas (13.3-7). 
When cr is variable it cannot be taken from under the integral sign, of course. In 
this case we cannot use formulas (13.3-8), and M must be found by integration. 

If L is a straight line in space, and if we have a system of particles of masses 
m,, . . . , m„, the perpendicular distance from m k to L being r k , then the moment 
of inertia of the system about L as an axis is defined to be 

I = mirf + • • * + m n r \ . 

To extend this definition to laminas in the xy-plane, let P k denote the point 
(x k , yk) in (13.5-6), and let Q k be the foot of the perpendicular drawn from P k to 
L. Then for the approximating system of particles we have r k = P k Q k , m k = 
cr(x fc , y fc ) A A k ; therefore, the moment of inertia of the lamina about L is 

I = Km 2 (P k Q k fa(x k , y k ) A A k . 

ic=l 

This limit is clearly a double integral. If D(x, y) denotes the perpendicular 
distance PQ from a typical point P(x, y) of the lamina to the axis L, the double 
integral is 


I = J J orD I 2 dA. (13.5-7) 

R 

In each particular problem D must be expressed as a function of the co- 
ordinates. For instance, if L is the y-axis, D 2 = x 2 , while if L is the z-axis, 

D 2 ~ x 2 + y 2 . 

Moments of inertia are often expressed in the form I = Mk 2 , M being the 
mass. The constant k is called the radius of gyration . 
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Example 2 . Prove the following proposition: Let 
a lamina occupy the region R, and let L be a line 
in the plane of the lamina. Let I denote the mo- 
ment of inertia of the lamina about L , and let J 0 de- 
note the moment of inertia about an axis Lo parallel 
to L through the center of mass of the lamina. Then 

I = I 0 + Mh 2 (13.5-8) 

where h is the distance between L and Lo. 

It is convenient to locate our co-ordinate system 
so that L coincides with the y-axis. See Fig. 110. 
This is permissible, since the physical quantities are 
independent of the position of the co-ordinate axes. The 
is D = |x|. Hence 


V 



Fig. 110. 


distance from P(x , y) to L 


1 = 


it 


ax 2 dA. 


R 


The equation of L 0 is x = x, and so 


for x = h. 
Thus 


I 0 = J J cr(x - x) 2 dA - j j <j(x - h ) 2 dA, 


/-/(, = JJ o-[x 2 - (x - h) 2 ] dA. 


Now x 2 - (x - h) 2 = 2 xh - h 2 . Therefore 


I-I 0 = 2h J! axdA-h 2 H a dA = 2hMx - h 2 M = Mh 2 . 


Thus (13.5-8) is proved. 

If we set cr = 1 in (13.5-7), the resulting integral 


If D 2 dA (13.5-9) 

R 

is called the second moment of the region R with respect to the axis L. This is a 
purely geometric characteristic of R in relation to L. Second moments are used 
in the theory of elasticity and strength of materials, particularly in the theory of 
the bending of beams. In the literature these second moments are often called 
moments of inertia. This is a misnomer, since no mass concept is involved. The 
physical dimensions of a moment of inertia are mass x (length)^, while those of a 
second moment of a plane region are (length) 4 . Second moments are also 
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important in statistics and elsewhere. The integrals 

J J x dA, JJydA 

R R 

occurring in the formulas (13.3-8) for the centroid are, by contrast, called first 
moments (about the y-axis and x-axis, respectively). 

EXERCISES 

1. In each of the parts of this exercise a lamina of a certain shape is described, and 
the manner in which its density varies is defined. Find the mass and locate the center of 
mass of each lamina. Wherever it occurs in this exercise, k denotes a constant of 
proportionality. 

(a) Triangular lamina with vertices at (0,0), (a, 0), (a, b); a = kx. 

(b) The same lamina as in (a), but with or = ky. 

(c) The lamina occupying the region defined by x 2 + y 2 ^ a 2 , x ^ 0, y ^ 0, with cr = kx. 

(d) The lamina of (c), but with cr = kx y. 

(e) The lamina of (c), but with a - k(x 2 + y 2 ) ,/2 . 

(f) The lamina in the first quadrant, bounded by bx 2 = a 2 y, x = 0, y = b, with cr = kx. 

(g) The lamina of (f), but with a = k(b - y). 

(h) The triangular lamina cut from the first quadrant by the line x + y = a, with cr directly 
proportional to the product of the distances from (x, y) to the sides of the triangle. 

(i) The lamina in the first quadrant, bounded by r = 2a cos 6 and 0=0, with cr = kr. 

(j) The lamina of (i), but with cr = kr sin 26. 

2. For any distribution of mass, let I x and I y denote the moments of inertia of the 
distribution about the x-axis and the y-axis, respectively, and let J 0 denote the moment of 
inertia about the axis perpendicular to the xy-plane at the origin. Show that J 0 = L + I y . 

3. In each part of this exercise, a homogeneous lamina is described. Find I x , I y , and 
Jo in each case (see Exercise 2). 

(a) The circular lamina bounded by x 2 +y 2 = a 2 . 

(b) The annulus bounded by the two circles x 2 + y 2 = r 2 (i = 1, 2, r i < r 2 ). 

(c) The rectangular lamina bounded by x = ± a, y - ±b. 

(d) The triangular lamina bounded by y = 0, x = a, ay = bx. 

(e) The elliptical lamina bounded by b 2 x 2 + a 2 y 2 = a 2 b 2 . 

(f) The lamina bounded by y 2 = 2ax and x = 2a. 

(g) The lamina occupying the circular segment x 2 + y 2 ^b 2 , x ^ b cos a where 0<ot< 
tt(2. 

(h) The lamina occupying the circular sector O^rSb, where 0 < /3 ^ tt/2. 

4. For a lamina occupying a region R in the xy-plane, the double integral 

U xy = J J crxy dA 

R 

is called the product of inertia of the lamina with respect to the co-ordinate axes. 
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Calculate this product of inertia for each of the following laminas, assuming constant 
density: 

(a) The rectangular lamina bounded by x = 0, x = a, y = 0, y = b. 

(b) The first-quadrant quarter of the circular lamina bounded by x 2 + y 2 = a 2 . 

(c) The triangular lamina with vertices at (0, 0), (a, 0), (a, b). 

(d) The square lamina bounded by x = - a, x = 2a, y = - 2a, y = a. 

(e) The lamina composed of all except the third-quadrant portion of the region inside the 
circle x 2 + y 2 = a 2 . 

( f ) The lamina bounded by y 2 — 2 a{x + a) and x = a. 

5. Products of inertia, as defined in Exercise 4, play an important role in the 
discussion of what happens to moments of inertia when the co-ordinate axes are rotated. 
Suppose the xy-system can be rotated into the x'y'-system by turning counterclockwise 
through an angle a. The equations relating the two systems are 

x' = x cos a + y sin a, 

y' = - x sin a + y cos a , 

and a similar pair of equations giving x, y in terms of x\ y\ For a given lamina, let A - I x , 
B = U x y , C = Iy (using the notation defined in Exercises 2 and 4), and let A', B\ C' denote 
the corresponding moments and products of inertia relative to the axes of the x'y'-system. 
Show that 

A’ = A cos 2 a - 2 B sin a cos a + C sin 2 a , 

A = A' cos 2 a + 2B' sin a cos a + C' sin 2 a, 

B f — B (cos 2 a — sin 2 a) + (A - C) sin a cos a, 
and find three more formulas of this type. Hence prove that the curve 

Ax 2 - 2Bxy + Cy 2 = 1 


has the equation 

A' 2 x' 2 — 2B'x'y'+ C'y' 2 = 1 

when referred to the x'y'-axes. 

This curve is called the ellipse of inertia for the given lamina, relative to the origin O. 
If the xy-axes coincide with the axes of symmetry of the ellipse, the xy-term in the 
equation must disappear, so that B = 0. In this position, the co-ordinate axes are called 
principal axes of inertia for the lamina relative to O. If B A 0, the position of the principal 
axes may be found by the method of choosing a so that B' = 0, i.e., 

B cos 2a + |(A - C) sin 2a = 0. 

dA ' 

6. Show that, if one regards A' as a function of a, = - 2 B'; hence show that A' is 

either a maximum of a minimum when the x'y'-axes are principal axes of inertia. 

7. Use the formula for A' in Exercise 5 to compute the following moments of inertia: 

(a) The lamina of Exercise 4(a), about its diagonal. 

(b) The lamina of Exercise 4(c), about the line 2ay = bx. 

(c) The lamina of Exercise 4(f), about the line y = 2x. 
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8. Find the principal axes of inertia for the following laminas: 

(a) The lamina of Exercise 4(a), if a = 2, b = 1. 

(b) The lamina of Exercise 4(b). 

(c) The lamina of Exercise 4(d). 

(d) The lamina of Exercise 4(e). 

(e) The lamina of Exercise 4(f). 

9. A lamina in the shape of the circle x 2 + y 2 % a 2 has density cr = (x 4 - y) 2 . Find its 
principal axes of inertia relative to its center, and the moments of inertia about these axes. 

13.51 / POTENTIALS AND FORCE FIELDS 

In the theory of electrostatics, the concepts of charge and charge density are 
entirely analogous to the concepts of mass and mass density, with this excep- 
tion: Charges may be either positive or negative, while we habitually think of 
masses as positive. A particle of electric charge e exerts an electrostatic force on 
another particle of charge e' according to the inverse-square law of Coulomb: 
The magnitude of the force is inversely proportional to the square of the 
distance between the charges, and directly proportional to the product of the 
charges. The force is directed along the line joining the charges, and like charges 
repel each other, while unlike charges attract. With proper choice of units 
(electrostatic units) the constant of proportionality may be taken as unity. 

The vector form of Coulomb’s law is as follows: Let e be at P, e' at P\ and r 
be the distance PP'. Then the force exerted by e on e’ is 

F = ^PPt (13.51-1) 

This should be compared with the analogous formula for gravitational attraction 
between two particles (see (10.51-4)). 

Next we consider how to deal with the notion of electrostatic force 
produced by a continuous distribution of charge on a plane lamina. Consider a 
particle of unit positive charge at a fixed point Q, anywhere in space, but not on 
the lamina. Let cr be the charge density on the lamina, which we assume 
occupies a region R in the xy -plane. In the usual manner, we subdivide R and 
consider the force exerted on Q by the system of point charges which is 
obtained when we concentrate the charge Ac of each part A R of the lamina at a 
point P within the part. The contribution of this part to the total is a force 



where r is the distance PQ (see Fig. 111). All such vectors must be added, and 
then we must carry out the limiting process. Since Ac is approximately cr A A, the 
total force exerted by the lamina is 

F,jjf,PQdA. 

R 


(13.51-2) 
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This is a vector double integral; i.e., the integrand is a 
vector function. We have not formally defined such 
integrals, but the work is entirely like that for scalar 
double integrals. 

When it comes to actual computation of F, we work 
with components of the vectors. It frequently occurs 
that because of symmetry we know in advance the 
direction of F, and hence need only to deal with 
components of F in that one direction. If L is a 
directed line and if tp is the angle which PQ makes 
with L, the component of F in the direction of L is 


z 



Fig. 111. 


F L = JJpcos^dA. (13.51-3) 

R 

To evaluate, we must express the integrand in suitable co-ordinates and then 
pass to an iterated integral. It must be kept in mind that the r in (13.51-3) is not 
necessarily the r of polar co-ordinates. 

The force exerted by a system of charges on a unit positive charge at Q is 
called the field at Q due to the system. 

Example 1. Find the field due to a uniformly charged circular lamina of radius b, 
at a point Q a distance c from the center of the lamina along the 
perpendicular to it (Fig. 112). 

The phrase “uniformly charged” means that the density is 
constant. We take the origin and the z-axis as shown in Fig. 

112. Using polar co-ordinates in the xy-plane, we have the 
distance PQ given by 

( PQ) 2 =r 2 +c 2 . 

By symmetry the force is evidently in the direction OQ, so that 

, c Fig . 112. 

COS Ip . — > x ' 

Vr+c J 



Hence, denoting the x, y, and z components of F by F\, F 2 , F 3 , we have 
F\ = F 2 = 0, and 


H (r>+ S c 2 f idA - aC L de L (r +% 12 


F 3 = lircrc (— 7 

V c VFT 


2ttct[ 1 — 


VFTc 5 / 


Observe that the force very near O is almost of amount lira. 
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A systematic study of the theory of electrostatic fields is greatly simplified 
by introducing the concept of the potential of the field. The potential at a point 
Q, produced by a charge e at the point P, is defined to be 

-> where r = PQ. 
r 

For the potential of several particles, the principle of superposition is used, and 
for continuous distributions of charge, the standard integral calculus procedure 
is employed. For a lamina on the region K, with charge density <j at P, the 
potential at Q is defined to be 


u(0) = JJ^ dA - (13.51-4) 

J? 

The potential is a scalar point function. The electrostatic field is a vector point 
function. The relation between the two functions is shown in the fact that the 
gradient of the potential gives the negative of the field vector: 

Vqu(Q) = — F. (13.51-5) 

The 0 on the gradient symbol is to remind one that we must differentiate with 
respect to the co-ordinates of Q. If P is ( x , y, z), and Q is (£, 17 , £), we have 

r 2 = (PQ) 2 = « - X) 2 + (v ~ y) 2 + a - Z)\ (13.51-6) 

and 

«(<?) = // dx dy, (13.51-7) 

R 

F = jj [(£ - x)i + ( V - y)j + ({ - z)k] dx dy. (13.51-8) 

R 

Formula (13.51-5) is then equivalent to 

F 'I = ~H = fj Z ^(t- x )dxdy, (13.51-9) 

R 

and two similar formulas for the other components of F. 

It is only in very special instances that the potential can be computed in 
elementary form by integration. Usually the work leads to elliptic or other 
nonelementary integrals. Nevertheless, the study of the potential is very fruitful. 
Extensive consideration of the theory of potential functions is outside the scope 
of the present book. 

Example 2 . The lamina bounded by the lines jc = 0, x = a, y = 0, y = b in the 
xy-plane carries a charge of density <j = xy. Find the potential at the point 
Q(0, 0, 0 on the z-axis. 
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The potential is 

■-// 






x dx 


(* 2 +y 2 +T) 


2 \ 1/2 


The first integration gives 


(a 2 +y 2 +£ 2 ) ,/2 -(y 2 +£ 2 )' /2 , 

u = [ [y(a 2 +y 2 + f 2 ) 1,2 -y(y 2 +f 2 ) l/2 ]4y, 

Jo 

u = kfl 2 + h 2 + £ 2 ) 3/2 - |(a 2 + £ 2 ) 3 ' 2 - \{b 2 + £ 2 ) 3 ' 2 + -ti 2 . 


EXERCISES 

1. Find the potential at Q in Example 1, and verify that F 3 = (assuming c >0). 


2. Find F 3 = F • k directly in Example 2, and then verify that F 3 = — — from the 


answer found in Example 2. 


3. Find u and F 3 at the point Q in Example 1 if, instead of constant density, we have 

a = r. 


4. Find the potential at a corner of a uniformly charged square lamina of side b. 

5. Find the potential at a point on the edge of a uniformly charged circular lamina of 
radius b. It is most convenient to take the point in question at the origin. 

6. Find the potential at Q(0, 0, b ), where b >0, due to a uniformly charged square 
lamina with corners at (0, 0, 0), (a, 0, 0), (a, a, 0), (0, a, 0). Set up the integral in polar 
co-ordinates, using the fact that the square can be divided by a diagonal so that each half 
contributes the same amount to the potential. The integral formula 


/ 


Vfl 5 + b* cos a 6 
cos 6 


d8 ~ b tan 


■( 


b sin 6 


V a 2 +b 2 cos 5 0 


, a, Va 2 + b 2 cos $ + a sin 6 , „ 

+ ~ log —======== + C 

2 Vfl cos 0-asin0 


will be useful. The z-component of the field at Q may be computed from F 3 = - dujdb but 
it is perhaps easier to compute F 3 directly by integration. 

7. It can be shown that dujd £ can be computed from (13.51-7) by doing the 
differentiation under the integral sign, provided Q is a point not in the region R or on its 
boundary. Proceed from this to verify (13.51-9), using (13.51-6). Thus (13.51-5) is proved. 


13.6 / TRIPLE INTEGRALS 

We shall deal with the definition of a triple integral somewhat more briefly than 
we did with the definition of a double integral. We begin with a closed bounded 
region R in three dimensions, and let /(*, y, z) be a function defined and 
continuous in R. As in §13.2 we must make some assumptions about the 
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character of the boundary of R. The precise nature of these assumptions need 
not be made explicit as long as we do not go carefully into questions of 
integrability. We shall for simplicity think of the boundary R as consisting of a 
finite number of surfaces, each of which is smooth except possibly at certain 
isolated points (e.g., the vertex of a cone) or along certain curves (e.g., the edges 
of a cube or the rims of a solid right circular cylinder). 

We take three sets of planes, parallel respectively to the x-, y-, and z-axes. 
The mesh of rectangular blocks which these planes form in space is called a 
rectangular partition. Those blocks, or cells, which belong entirely to R are 
numbered consecutively in any order. Let A V k be the volume of the kth cell, and 
let its x-, y-, and z-dimensions be Ax k , Ay k , Az k respectively, so that AV k = 
Ax k Ay k Az k . Finally, let (x k , y k , z k ) be an arbitrarily selected point in the Icth cell. 
Then we define the triple integral of the function / over R by following limit, as 
the maximum dimensions of all the cells approach zero: 


in f(x, y, z) dV = lim 2 /(*k, y k, z k ) A V k , (13.6-1) 

R 

or, in another notation, 


sis 


f(x, y, z) dx dy dz = lim f( x k, y k> z k ) Ax k Ay k Az k . 

k 


(13.6-2) 


We take for granted that this limit exists and is independent of the particular 
method of forming the partitions and choosing the points (x k , y k , z k ). 

The analogue of Theorem II, §13.23, is true for triple integrals; that is, the 
integral is given by (13.6-1) when the subregions, instead of being rectangular 
blocks, are formed in any manner (as long as they are sufficiently regular in 
shape). They need not completely fill out the region R, provided that the amount 
of volume omitted approaches zero in the limit. These remarks are of im- 
portance for the understanding of what happens when we use cylindrical or 
spherical co-ordinates. 

The properties of double integrals explained in §13.21 extend at once to 
triple integrals. The same is true of the inequalities of §13.22, and the mean-value 
theorem. 

When it comes to devising an explanation of the evaluation of triple integrals 
by iterated integrals, we must proceed differently than in the case of double 
integrals, for no intuitive geometric procedure analogous to that of §13.3 is 
available to us (a four-dimensional space would be required). There is a direct 
analytical method, however. This method could have been used for double 
integrals as well. We shall give a heuristic account of the method, thus making 
its plausibility clear. A fully rigorous account is rather long, and it seems 
advisable to leave the details for later study. 

Let us first state the result. The letters x, y, z can be written in six possible 
orders. Corresponding to each such order there is an iterated integral evaluation 
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of the triple integral, calling for three successive single integrations. The main 
problem of technique is that of learning how to write the limits of integration for 
the iterated integrals. The notation for an iterated integral is illustrated by 

f dy f " dx f (x 2 +y 2 )dz. (13.6-3) 

Jo Jo J x+y 

The integrations in (13.6-3) are to be performed in the order z, x, y. 

It will be enough to explain the transition from the triple integral to an 
iterated integral for one particular order of integration. Suppose this order is first 
with respect to z, then with respect to x, and finally with respect to y. Choosing a 
typical value of y, consider the cross section of R by a plane y = constant, 
parallel to the xz-plane. We assume that R is of such a shape that all the 
foregoing cross sections are plane regions of the type dealt with in our dis- 
cussion of iterated integrals in two dimensions. As shown in Fig. 113, let the 


z 



largest and smallest values of x in the cross section be respectively X)(y) and 
X 2 (y), and let Z t (x, y), Z 2 (x, y) be the values of z for which a typical line parallel 
to the z-axis in the cross section cuts the boundary of R . Finally, let y = a and 
y = b be the extreme values of y in the region R. Then 

JJJ f(x, y, z) dV = J dy J * dx £ f(x, y, z) dz. (13.6-4) 

R a Xj Zi 

The formula (13.6-4) is the fundamental theorem about evaluating triple integrals 
by iterated integrals in rectangular co-ordinates. Before giving a heuristic 
justification of the formula we give an illustrative example. 

Example . Find the centroid of an octant of a solid sphere. 

Let x 2 + y 2 + z 2 = a 2 be the equation of the surface of the sphere. We consider 
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the first octant. Evidently x = y = z, so we find x only. 
Analogous to (13.3-8) we have 


///*"• 


where V is the volume of R. In the present case 
V = ( 7 r/ 6 )a 3 . In the notation of (13.6-4) we see from Fig. 

1 14 that 

Zi = 0, Z 2 = Vfl 5 -r-V, X, = 0, X 2 = \/a r ^y 1 . X 

Fig . 114. 

Hence 

__ ra rVa 2 -v 2 rVa^-x^-v 2 



a 3 x = I dy I dx xdz 

Jo Jo Jo 

ra rVa 2 -y 2 

= I dy xV a 2 - x i ~ y 1 dx . 

Jo Jo 


The x -integration yields 

-|[a 2 - y 2 - x 2 ] 3 «| 0 V ^= l(a 2 - y 2 ) 3/2 . 


Hence 


a 3 x = 3 f a (a 2 — y 2 ) 312 dy 
Jo 


we omit the details of the last integration. Finally, then x = la. 


Now to explain (13.6-4). We go back to the definition (13.6-2). Let us single 
out all the cells of the partition which belong to R and lie in a particular column 
parallel to the z-axis (see Fig. 115). We may choose the points (x k , y ky z k ) so that 
the co-ordinates x k , y k are the same for all the points belonging to cells in the 



Fig. 115. 
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same vertical column. The values Ax* and Ay* will also be the same for all the cells 
in one column, and the area of the base of the column will be Ax k Ay*. Let us 
number the columns, say from 1 to N. Suppose A is the area of the base of the 
ith column, and suppose the number of cells in the ith column is m f . Let the 
points associated with these cells be (x' h y\ 9 z#), j=l,...,m„ and let their 
z-dimensions be A Zg. Then the sum in (13.6-2) can be written in the form 



(13.6-5) 


The inside sum here is of the type occurring in the definition of a definite 
integral with respect to z. The interval of z-values that is being subdivided is 
approximately from the lower to the upper bounding surface of R, that is, from 
Z}(Xj , yf) to Z 2 (xJ, y{). Hence the inner sum is an approximation to 


z 2(4 y'i) 


For convenience let us write 


f(x i, y„ z ) dz. 

(13.6-6) 

Z 2 (x, V) 

f(x, y, z) dz. 

(13.6-7) 


i(*> y) 


Then the expression (13.6-6) is g(x\, yj), and (13.6-5) is seen to be approximately 
equal to 

2g(*:,yi)AA (13.6-8) 

i = l 

if the cell dimensions in the z-direction are all sufficiently small. This sum, in 
turn, is of the type occurring in the definition of a double integral. If T is the 
plane region obtained by projecting the points of R perpendicularly on the 
xy-plane, the bases of the columns form a rectangular partition of T. When the 
dimensions of the cells of this partition are small enough, the sum (13.6-8) is very 
nearly equal to the double integral 


J j g(x, y) dA, 

T 

which in turn is equal to the iterated integral 

f dy f g(x,y )dx, (13.6-9) 

Ja JX j 

as we see from Fig. 113. We see, therefore, on combining (13.6-7) and (13.6-9), 
that the sum (13.6-5) is an approximation to the iterated integral 

rb r X 2 fZ 2 

dy dx f(x, y, z) dz. 

Ja JX , JZ i 

It may be shown in more detail that the approximation becomes better and better 
as we take the limit defining the triple integral, so that (13.6-4) is exactly true. 
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13.7 / APPLICATIONS OF TRIPLE INTEGRALS 

Triple integrals may be used to calculate the locations of centers of gravity, the 
masses of solids of variable density, moments of inertia, and other quantities of 
physical or geometrical significance. The fundamental principles of such ap- 
plications are the same as those set forth in connection with double integrals 
(§13.5). 

We shall use the Greek letter ijl for volume density. The mass of a solid of 
variable density [x(x, y, z) occupying a region R is then 

M = Jll /xdV. 

R 

The center of gravity (Jc, y, z) is found from the formula 

in x/xdV 

R 

and two other similar formulas. The moment of inertia about the z-axis is 


I* = J7/ ( x 2 +y 2) MV. 

R 

The product of inertia relative to the planes x =‘0 and y = 0 is 


U xy = fff xy^dV. 

R 

Other moments of inertia I x , J y , and other products of inertia U yz , U zx are defined 
by analogous formulas. 

Problems in gravitational attraction are mathematically almost identical with 
problems of electrostatic forces, since Newton’s law and Coulomb’s law are 
both inverse-square laws. There is a difference in sign, since two masses attract 
each other, whereas two positive charges repel. Newton’s law for mass particles 
m and m f at P and P', a distance r apart, states that m exerts on m’ a force 


F =k^-P'P, 


where k is a universal constant depending only on the units of mass, distance, 
and force. In theoretical work it is customary to choose units such that k = 1. 
We shall do this. The force of attraction on a unit mass at Q, produced by a solid 
of density fx occupying a region P, is 


F = | / / fsQPdV, 

R 
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where r = QP , jjl is evaluated at P, and integration is carried out with respect to 
the co-ordinates of P. 

The concept of potential is useful in the theory of gravitational attraction. The 
potential at Q is defined to be 


«(Q) = / JJf dV - 

R 

The relation between the potential u and the gravitational field force F is 
expressed by the equation 

F = V Q u; 

i.e., the field is the gradient of the potential. The situation is comparable to that 
in electrostatics (see §13.51); there, however, the field is the negative of the 
gradient of the potential. The difference in sign arises from the difference in sign 
between Newton’s and Coulomb’s laws. 


Example 1. The first octant portion of the solid inside 
the cylinder x 2 + y 2 = a 2 and between the planes z = 0, 
z = h has density a = x. Find its mass. We have 


f f f fh ra rVa 2 ~y 2 

M=Jjj x dV = \ dz I dy J x dx ; 

R 

M=f dz f 2 (u 2 - y 2 ) dy = f %'dz = \a i h. 
Jo Jo Jo 3 


The finding of the limits of integration is illustrated in 
Fig. 116. 


z 



Example 2. Find the moment of inertia about the z-axis of the homogeneous 
tetrahedron bounded by the planes z = x + y, x = 0, y =0, z = 1. The integral in 
this case is 


h = j JJ fi(x 2 +y 2 )dV 

R 

= fjL f dy f dx f ( x 2 -hy 2 )dz . 
JO Jo Jx+y 


The limits of integration are found by an examina- 
tion of Fig. 117. Completion of the integration is left 
as an exercise for the student. The result is 


h 



z 



Fig. 117. 


Since the volume of the tetrahedron is i the mass is M = jut/6, whence fi = 6 M and 
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I z = Ml 5. This means that the radius of gyration of the tetrahedron about the z-axis 
is k 2 = 1/V5. 

EXERCISES 

1. A homogeneous solid block is bounded by the planes x ~ 0, jc = a, y = 0, y = b, 
z = 0, z - c. Show that I x = (M/3)(b 2 + c 2 ). 

2. Locate the center of gravity of the tetrahedron in Example 2. Consider some of 
the other orders of integration, and select a convenient order for each part of the work. 

3. (a) Locate the center of gravity of the solid tetrahedron cut from the first octant 

by the plane ( xla ) + (y lb) + (z/c) = 1. (b) Find the moment of inertia of this tetrahedron 

about the y-axis. 

4. Find x and y for the solid of Example L 

5. The unit cube bounded by the planes x = 0, x = 1, y = 0, y = 1, z = 0, z = 1 has 
density ^ = xz. (a) Find the mass and locate the center of gravity of the 
cube, (b) Find the y -component of the attraction which the cube exerts on a unit mass 
at the origin. 

6. A homogeneous solid is bounded by the plane z = 0 and the paraboloid ( x 2 la 2 ) + 

(y 2 /b 2 ) + (z/c) = 1. (a) Locate its center of gravity, (b) Find L, I yy and I z for the solid. 

7. Consider the homogeneous solid bounded by the ellipsoid 

(x 2 la 2 ) + (y 2 lb 2 ) + (z 2 lc 2 )=l. 

(a) Locate the center of gravity of the first octant portion of this solid. 

(b) Calculate the moment of inertia of the entire solid about the x-axis. 

8. The density of a cube of edge 2a is proportional to the square of the distance 
from its center, the density being unity at the center of each face, (a) Find the total 
mass and the average density, (b) Find l z for this cube, (c) Find the ratio of the L in 
(b) to the value which I z would have if the cube had the same total mass distributed 
uniformly throughout the volume. 

9. If a solid has density at (x, y, z) equal to the distance from (x, y, z) to the origin, 
show that the potential at the origin is equal to the volume of the solid. 

10. Suppose a solid has density at (x, y, z) equal to the cube of the distance from 
(x, y, z) to the origin. Show that the attraction of the solid on a unit mass at the origin is a 
force directed toward the centroid of the volume occupied by the solid, and equal in 
magnitude to the product of the volume and the distance from the origin to the centroid. 

11. Let I denote the moment of inertia of a solid about a certain axis, and let I 0 be 
the moment of inertia about a parallel axis through the center of gravity. If h is the 
distance between the two axes, prove that I = I Q + Mh 2 , where M is the total mass of the 
body. 

12. Consider two rectangular co-ordinate systems with the same origin, the relations 
between the xyz-system and the x'y'z'-system being expressed in the notation of §10.3. 
For a given body, show that 


IJ 2 + I y m 2 + I z n 2 - 2U yz rruni ~2U zx tiih -2U xy lmi. 


This shows how products of inertia enter into consideration of the effect of rotation of 
axes upon moments of inertia. 
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The equation 

I x x 2 + I y y 2 + I z z 2 - 2 U yz yz - 2 U^zx - 2U xy xy = 1 

defines what is called the ellipsoid of inertia for the body relative to the origin O. A set of 
axes such that the products of inertia all vanish is called a set of principal axes of inertia 
for the body. 


13.8 / CYLINDRICAL CO-ORDINATES 


If we use polar co-ordinates in a plane, and a rectangular co-ordinate along an 
axis perpendicular to the plane at the origin of the polar 
system, the combination is called a cylindrical co- z 

ordinate system. Most commonly the polar co-ordinates 
are taken in the xy -plane (see Fig. 118), but there is no 
logical necessity for this choice. It is often convenient to 
evaluate a triple integral by an iterated integral in 
cylindrical co-ordinates. As we saw in §13.6, 


lit 


f(x, y, z) dV = 


*//“/; 


f(x , y, z) dz. 

(13.8-1) 



If we express the integrand in cylindrical co-ordinates, say f(x , y, z) = F(r, 6 , z), 
the double integral in (13.8-1) may be evaluated as an iterated integral in polar 
co-ordinates. This leads to the result 


JJJ F(r, 9, z) dV = d9 J* r dr F(r, 0, z) dz. (13.8-2) 


Do not fail to observe the factor r which is introduced into the integrand of the 
iterated integral. The limits Zu must be expressed in terms of r and 0 ; the r 
and 6 limits are found by inspection of the plane region T, the “shadow” of R on 
the xy-plane (see Fig. 115). The result (13.8-2) and others like it may also be 
obtained by an argument similar to that beginning after the Example in §13.6. 
There are five other possible orders of integration. A systematic method for 
determining the limits of integration for any given order is illustrated in the 
following example: 

Example . Find the moment of inertia of a homogeneous right circular cone 
about its axis. 

Let the radius of the base be b, the altitude be h. The density pc is constant, 
so M = pi(7Tl3)b 2 h. We place the cone as shown in Fig. 119 (we draw only 
one-fourth the cone). Let the integration order be r, 2 , 6. We must first set up the 
triple integral: 

I = Jff n(x 2 +y 2 )dV = IX fffr 2 dV. 

R R 
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Now picture a section of the region JR made by holding the 
last integration variable (here 0) constant. In the present 
case this is the triangle OAB. Next assign the second 
integration variable z a typical value, and determine the 
range of freedom left to the first integration variable r. 

This process is indicated in Fig. 119 by the line CD. Since 
OC = z, the value of r at D is given by 

r = h 
z h 

The r limits of integration are therefore 0 and zb/h. 

Now let the line CD range in the z-direction as much 
as it may (from 0 to h ); these are the z-limits of 
integration. Finally, let 0 vary through all values necessary to have the 0- 
sections sweep out the entire region R . We see that the 0 -limits of integration are 
0 and 27 r. Therefore (remembering the additional factor r), 

fliT rh rbzlh 

I = fji do dz r 3 dr = ~ b 4 h. 

Jo Jo Jo lU 

This may be written I = voMb 2 . 

EXERCISES 

1. For the solid cone of the illustrative example find (a) I y ; (b) the location of the 

center of gravity of the first octant portion; (c) the attraction exerted on a unit mass at 
the origin; (d) the potential at a point (0,0, £), where £ <0. 

2. Find the moment of inertia of a homogeneous solid sphere of radius a, about a 
diameter. 

3. For a homogeneous solid right circular cylinder of height h and radius of base a, 

find the moments of inertia (a) about the axis of the cylinder; (b) about a line through 
the center of gravity of the cylinder, perpendicular to the axis of the cylinder; (c) the 
attraction exerted by the cylinder on a unit mass at the center of one end; (d) the 
potential at a point (0, 0, £), assuming the cylinder defined by x 2 -fy 2 ^a 2 , h , and 

assuming £ ^ h. 

13.9 / SPHERICAL CO-ORDINATES 

To form a spherical co-ordinate system we start from an origin O and a fixed ray 
issuing from O. We shall take the ray as the positive z-axis; there is, however, 
no necessity for any one special relation between spherical and rectangular 
co-ordinates. The spherical co-ordinates are the distance p = OP and the two 
angles 0, <f> (see Fig. 120). The angle 0, sometimes called the azimuth of P, is the 
same as that used in plane polar co-ordinates. The angle (/> is the colatitude of P. 
We always choose <f> in the range 0 ^ </> ^ tt. For most work p is taken 
nonnegative. 

The student should be aware that in some books the roles of 0 and (f> are 


z 



Fig. 119. 
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reversed, so that 0 denotes the colatitude. This is 
particularly true of European texts, and books on 
mathematical physics. With due caution the student 
should have no trouble accommodating himself to such 
differences in notation. 

The standard formulas relating spherical to rec- 
tangular co-ordinates are 

z = p sin 4> cos 0, y = p sin </> sin 0, z = p cos $ . x 

(13.9-1) Fig. 120 . 

To explain the use of spherical co-ordinates in evaluating triple integrals, we 
employ a method like that of §13.4. The curvilinear co-ordinates p, 9, $ give rise 
to a partition of R into cells which resemble cubes. A typical such cell is 
bounded by three pairs of surfaces, two each of the types p = constant (spheres), 6 = 
constant (half-planes through the z-axis), = constant 
(cones). Such a cell is generated by taking a shaded area as 
shown in Fig. 121 and turning it about the z-axis through an 
angle AO. By formulas of elementary solid geometry the 
volume of this cell is found to be 

A V = ![(p + A p) 3 - p 3 ][cos <;> - cos(c f> + A<*>)] AO. (13.9-2) 

Now, by the law of the mean, we have 

(p + Ap) 3 - p 3 = 3 p' J Ap, 
cos (<£ + A</>) - cos 4> = - sin </>' A$, 

where p' is between p and p + Ap and <$' is between (/> and + A<£. 

Thus (13.9-2) becomes 

A V = p' 2 sin <t>' Ap AO A </>. 

This may be expressed by saying that there is in each cell, say the kth one, a 
point (p k , 0 fe , (f ) k ) such that the volume of the cell is 

A V k = pi sin <t> k A Pk AOfc A</> k . (13.9-3) 

It is worth while observing the expression p 2 sin Ap AO A<£ is evidently a 
good approximation to AV in (13.9-2) when the dimensions of the cell are small, 
for three concurrent edges of the cell have the lengths 

Ap, p A</>, p sin (f) AO. 

Now consider a triple integral 

If I F(p, 0, <t>) dV, 

R 

the integrand being expressed in terms of spherical co-ordinates. We express this 
integral as the limit of a sum 




2 F( Pk ,0 k , 4>k) AV k , 

k 
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using the formula (13.9-3) for AV k , so that 

JJJ F(p, 6, <t>) dV = lim Y F(p k , 6 k , <f> k )p k sin <p k A p k A 8 k A<fi k . (13.9-4) 

R 

By looking upon the equations (13.9-1) as defining a mapping between an 
xyz-space and a pf?</> -space, we interpret the limit on the right in (13.9-4) as 
defining a triple integral over the image of R in the p0</>-space (compare with 
(13.4-4), and so arrive at the iterated integral 

HI F(p, e, <t>) dV = de P d<t> F Ftp, 6 , </>)p 2 sin 4> dp, (13.9-5) 

R ‘ ' 

or any one of five other iterated integrals. The essential thing to observe is the 
factor p 2 sin </> on the right. The finding of the limits of integration follows the 
same principles illustrated in §13.8 and earlier work. 

Spherical co-ordinates seem to be particularly convenient for many gravita- 
tional-attraction problems. 

Example . The smaller volume bounded by the sphere x 2 + y 2 + z 2 = 4a 2 and 
the plane z = a is filled with a homogeneous solid. Find the gravitational field 
which this solid produces at the origin. 

The density p is constant; the force is in the z-direction, so F { = F 2 = 0. In 
the integral for F 3 we have ip = (j>; hence 



R 


We integrate in this order: first p, then 0, and finally </>. Finding the limits is illustrated 
in Fig. 122 and Fig. 123. The equation of the plane z = a is written as p = a sec </>; 
that of the sphere is p = 2a. Putting in the factor p 2 sin </>, we have 

r ttI3 c2tt r2a 

F 3 =/jl d(f> dO I sin <p cos 0 dp. 

Jo Jo J a sec 

rn/3 

F 3 = 2irp (2a sin <p cos </> - a sin </>) defy 
Jo 

= 27 rpa [sin" </> + cos * 



Fig . 122. 


Fig. 123. 
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EXERCISES 

1. Find the moment of inertia of a homogeneous spherical solid about a diameter, 
using spherical co-ordinates. 

2. Locate the centroid of an octant of a solid sphere, using spherical co-ordinates. 

3. Consider the homogeneous solid cone defined by h 2 (x 2 + y 2 ) ^ b 2 z 2 , O^z^h. 
Find (a) the attraction it exerts on a unit mass at the origin; (b) the moment of inertia 
about the 2 -axis; (c) the moment of inertia about the y-axis. 

4. Consider the homogeneous solid sphere whose surface is defined by p = 
2 a cos cf). It is divided into two parts by the plane z = a. (a) Find the attraction which 
each part exerts on a unit mass at the origin, (b) What is the ratio of the magnitudes of 
these attractions? (c) What is the combined attraction of both parts? 

5. Suppose the sphere x 2 + y 2 + (z - h) 2 = a 2 , where 0 < a ^ h, is filled with matter of 

constant density p. Show that its attraction on a unit mass particle at the origin is 
inpa 3 lh 2 , and hence that the attraction is the same as though all the mass were 
concentrated at the center of the sphere. Begin by showing that a typical ray from O 
pierces the sphere at the two points given by p = h cos (f> ±(h 2 cos 2 <£ + a 2 - h 2 ) 1/2 , and 
that the range of </> values to be considered is 0 ^ ^ sin _1 (a/h). 

6. In Exercise 5, suppose that 0 ^ h < a, so that the origin is inside the sphere. In this 
case show that the attraction is in ph. This shows that, if the sphere is divided up into a 
smaller concentric sphere of radius h , and a shell between the spheres of radii a and h, 
the net force of attraction due to the matter in the shell is zero, the total force being just 
that produced by the matter in the sphere of radius h, on whose surface O lies. 

7. Show that the value of the integral 


f 


(c - p cos 4>) sin 4> ,, 


is 2 (c 2 if c>p§0, and is equal to 0 if p>c^0. Also show that it is equal to 1/c 2 if 
p = c >0. Some of these results are useful in Exercises 8. If p and cp^ 0, the integral 
may be evaluated by making the substitution f 2 = c 2 + p 2 -2cp cos t dt = cp sin cf>d4 >. 

8. Consider a solid sphere of radius a, center at the origin, not necessarily of 
constant density, but with the mass distributed symmetrically about the center, so that p 
depends only on p: p = p(p ). Show that the mass of the sphere is 


M = 4n \ p 2 p(p)dp. 
Jo 


If a unit mass particle is h units from the center of the sphere, show that it is attracted 
with a force Mlh 2 if h ^ a, and with a force 

l pn(p) dp 


if 0 < h < a. If the density is constant, these results are the same as those obtained in a 
different way in Exercises 5, 6. 



14 / CURVES AND 
SURFACES 


14/ INTRODUCTION 

Curves and surfaces are geometric entities with which the student is to some 
extent familiar. The simplest examples of these entities, such as the conic curves 
in the plane, and spheres, cylinders, cones, and other quadric surfaces in space, 
have been encountered repeatedly from analytic geometry through calculus. 
Geometrical interpretations of functions of one or two independent variables 
have led the student to think of curves and surfaces in quite general terms. In 
this chapter we propose to make a careful study of the means by which we 
render our intuitive notions about curves and surfaces amenable to precise 
mathematical treatment. This is done partly as an introduction to a branch of 
geometry — what is known as differential geometry — and partly as preparation for 
the following chapter on line and surface integrals. 

A point to be emphasized is this: Our intuitive notions about curves and 
surfaces are all derived from relatively simple examples of these things. The 
general concepts of curves and surfaces are very inclusive, however, and in our 
studies we must remember that when we wish to prove something, we must 
appeal to the definitions and previously established theorems, not solely to our 
intuitions, which may present us with an oversimplified picture. Direct geometric 
visualization of the subjects of our discussion is, however, of great value, both 
for the suggestions we can derive and for the better understanding and retention 
of what we learn. 

14.1 / REPRESENTATIONS OF CURVES 

Intuitively we think of a curve as a one-dimensional configuration, like the path 
of a moving particle, or as something we might obtain by bending and twisting a 
straight line. We shall define a curve by saying that it is an ordered configuration 
of points (x, y, z) given by three continuous functions of a parameter: 

* = fit), y = git), z = hit); (14.1-1) 

the range of the parameter is to be some interval (finite or infinite) of the real 
axis. We speak of (14.1-1) as a parametric representation of the curve. A curve 
may have more than one parametric representation. If we interpret t as time, 
(14.1-1) may be regarded as defining the path of a moving point. The point may 
pass through the same position in space several times; in this case the curve 
intersects itself. Evidently a curve in the above sense of the word is very 
general, and may not be very smooth. Imagine, for instance, the track of a tiny 
particle in Brownian movement over a long period of time. 
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To avoid some of the complexities which may occur in dealing with curves 
in general, we introduce some restrictions. By an arc we mean a curve which 
does not intersect itself, which has two distinct ends, and which is represented in 
the form (14.1-1) with a finite range of the parameter t, say where 

a < b. A semicircle is an arc, but an entire circle is not. If a curve is defined by 
(14.1-1) with a finite range a^t^b of the parameter, and if the points 
corresponding to t = a and t = b are coincident, we say that the curve is closed 
(it has no free ends). Such a closed curve without self-intersections is called a 
simple closed curve , or a closed Jordan curve (after a French mathematician). 
The prototype of an arc is the unit interval 0^ x ^ 1, while the prototype of a 
simple closed curve is the unit circle x 2 + y 2 = 1. Continuous deformation (bend- 
ing, twisting, stretching, shrinking) of an arc leaves it still an arc, provided no 
points are brought together which were originally distinct. The like can be said 
about closed Jordan curves. The boundary of a square is a closed Jordan curve. 
A parabola is neither an arc nor a closed curve. It may be regarded, however, as 
an infinite number of arcs joined end to end. 

A curve is called smooth if two conditions are satisfied: (1) it does not 
intersect itself, and (2) it has a tangent line at each point, whose direction varies 
continuously as the point moves along the curve. The second of these con- 
ditions is satisfied if the functions /, g , h in (14.1-1) have continuous 
derivatives which do not all vanish together for any value of t. The direction of 
the tangent line is specified by the ratios 

f'(t):gV):h'(t), (14.1-2) 

and we can find direction cosines by normalization (i.e., division by the square 
root of the sum of the squares of the three quantities in (14.1-2)). 

The curves with which we normally deal either are smooth or each bounded 
portion of the curve is sectionally smooth , that is, composed of a finite number 
of smooth arcs joined end to end. Such a curve may have corners at the junction 
points. The periphery of a square is an example of a closed curve with corners. 

Frequently we encounter curves as the intersections of two surfaces. A 
parametric representation of the form (14.1-1) may not immediately present 
itself, but may theoretically be derived from the analytical representations of the 
surfaces by implicit-function arguments. 


14.2 / ARC LENGTH 

The length of a smooth arc from f = a to t = b is given by an integral: 

i-n(f) ,+ @) ,+ <s)T* 

This formula, or the special form of it for plane curves, is familiar from 
elementary calculus. The derivation will now be given. 

Consider any curve C, closed or not, without self-intersections, and defined 
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by (14.1-1) with a finite parameter interval a ^ t ^ b. Consider a subdivision 

a = f 0 < ti < t 2 < • * • < t n = b 

of the interval (a, b) into n subintervals, and let P k be the point of C correspond- 
ing to t = t k . Joining the points P 0 , Pi, . . . , P n in order, we obtain a polygonal line 
inscribed on C. The length of the polygonal line is 

+ (14.2-2) 

Now consider what happens if we allow n to increase, and make the length of 
the longest of the segments P k _jP k approach zero. If the sum (14.2-2) ap- 
proaches a finite limit, we call this limit the length of the curve C, and we say 
that C is rectifiable. (For a curve that is not rectifiable, see Exercise 11.) A curve 
may be rectifiable without being smooth or even sectionally smooth; we shall 
confine ourselves to showing that a smooth arc is rectifiable, with length given 
by (14.2-1). The length of a sectionally smooth curve is found by adding the 
lengths of its component arcs. 

Consider a segment P^-iP* of the polygonal line. Its length is 

[(Ax t ) 2 + (Ay*) 2 + (Azi;) 2 ] 1 ' 2 , (14.2-3) 


where &x k = f(t k )~ f(t k -,), 

with similar formulas for Ay k and A z k . By the law of the mean we find 
Ax k = f'(a k ) A t k9 Ay k = g'(ft) A t k , A z k = h’(y k ) A t k , 

where A t k = t k - f k _, and the points a k , ft, y k are between t k ^i and t k . Thus 
(14.2-2) becomes 

2 K/'(« k )) 2 + (g'(A)) 2 + (h'(y k )f)' 12 At*. (14.2-4) 

k-1 


If the points a k , ft, y k were all the same, the limit of the sum (14.2-4) would be, 
by definition, the integral 

f b Kf (0) 2 + (g'O)) 2 + (h'(0) 2 ] ,/2 dt, 

J a 


which is the same as (14.2-1). Because the three points need not be the same, 
more argument is needed. The matter is covered, however, by a general theorem 
(Duhamel’s principle) in the theory of integration, and we refer the student to 
Example 2, §18.21, for the final discussion of the issue. 

Let C be a smooth arc and let s be the length measured along C from t = a 
to a variable point. It follows from (14.2-1), with the upper limit b replaced by a 
variable, that 


ds 

dt 


:*)+©+(■ tn 


2 -i 1/2 


(14.2-5) 


Hence 


ds 2 = dx 2 + dy 2 4- dz 2 . 


(14.2-6) 


The integral giving the arc length is frequently such that no evaluation of 
the integral can be made in terms of elementary functions. This is the case even 
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with plane curves. Sometimes the arc length is expressible in terms of tabulated 
standard integrals, such as elliptic integrals. 

Example. Consider the first octant portion of the curve of intersection of the 
sphere and cylinder 

x 2 + y 2 + z 2 = 4a 2 , x 2 + (y - a) 2 = a 2 , (14.2-7) 

as shown in Fig. 124. 

It is convenient to use 2 as a parameter for this 
curve. If we eliminate x by subtracting the two 
equations in (14.2-7), we find 

4a 2 - z 2 

y = ~r- 

Substituting this result in the second of the equations 
(14.2-7), we find 

x = 2- V4a 1 -z J . 

2 a 

Fig. 124. 

As parametric equations of the curve, we have 

2 /~z 5 5 4a 2 - z 2 

x — V 4a 2 — z , y = — ~ > z = z. 

2 a 2 2 a 



A direct calculation shows that 


ds 2 = 


8a 2 - z 2 
4a 2 - z 2 


dz 2 . 


The range of z is from 0 to 2a, so the length of the first-octant portion of the 
curve is 

This integral is improper at the limit z = 2a, but it is convergent. It can be put in 
the form of a standard elliptic integral of the second kind. For further discussion 
of this problem see Exercise 8. 


EXERCISES 

The standard elliptic integral of the second kind is defined as 

J m <t> _ 

Vl - k 2 sin 2 t dt, 

0 

where 0<k<l. If <£ = 7t/2, the integral is called complete. Values of this integral for 
various values of the parameters k, 4> ar e given in many books of tables. In some of the 
exercises it is required that the arc length be expressed in the form of such a standard 
integral. 
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1- Consider the cylindrical helix x ~ a cos 0, y = a sin 0, z = k8. Show that the arc 
length from 0 = 0 O to 0 = 0i is directly proportional to 0 1 - 6o. 

2. A point moves on a conical helix according to the formulas x = t cos cot, 
y = t sin z = t, where t is the time. Find the speed of the point at t = 0 and at t = 1 , 
and its average speed during this time interval. 

3. Find the length of the curve x = a(0 - sin 8 ), y = a(l - cos 0), z = 4a sin(0/2), 

4 . If 0 < b < 4a, show that the length of the curve x = a(0 - sin 0), y = a(l - cos 0), 
z = b sin(0/2) from 0 = 0 to 0 = 27 t is 

l [ 4 a 2 - ( 4 a 2 -j) cos 2 1 ] de 


and that this is equal to 8aE(k, ttI 2), where k 2 - 1 - (b 2 /16a 2 ). 

5. If 0<4a < b, show that the arc length sought in Exercise 4 is 2 bE(k, ttI 2), where 
k 2 =l-(16 a 2 lb 2 ). 


6. Show that the total circumference of an ellipse of major axis 2a and eccentricity 
e is 4 aE(e, u/2). Start with the parametric form x = a sin t, y = b cos t, b 2 = a 2 (l - e 2 ). 

7. Consider the intersection of the ellipsoid (x 2 /a 2 ) + (y 2 /h 2 ) + (z 2 /c 2 ) = 1 and the 


cylinder x 2 + y 2 
length 


b , where a > b. Show that the first-octant portion of this curve has arc 


ak 


(*•!)• 


where k ~ 


c\a 2 -b 2 ) 


c 2 (a : 


b ) + a b‘ 


8. Show that the integral (14.2-8) is equal to 2V2 aE(V 2/2, 7t/ 2). This may be done 
either by making a suitable change of variable in (14.2-8) or by using the parametric 
representation of the original curve: x = a sin 2f, y = a + a cos 2 1, z = 2a sin t. 

9. Find the length of the curve x = a cosh f, y = a sinh t, z = at, from t = 0 to t = t t . 
This curve lies on the hyperbolic cylinder x 2 - y 2 = a 2 . 

10. A point starts at (0, a, 0) and moves into the first octant along the curve 
y = a cosh (x/a), z = a sinh(x/a). Show that when it is at (x, y, z) it has traveled a distance 
V2z. 

2 1 

11. Consider the plane curve defined by y =/(x), 0^x , where /(x) = x sin — if 

7T X 

x^0 and /(0) = 0. Show by direct use of the definition of a rectifiable curve that this 
curve is not rectifiable. Hint: Consider the points of the curve with x -coordinates 0 and 

k = 1, 2 , . . . , 2n. 


14.3 / THE TANGENT VECTOR 


Let C be a smooth arc. Let us choose a certain direction along the arc as the 
positive direction. At any point of C we shall define what we 
call the tangent vector. This is a vector of unit length along the 
tangent line in the positive direction. See Fig. 125. It is 
denoted by T; the vector T is a vector point function (in the 
sense of §10.51) defined along C. If the angles which T makes 
with the positive co-ordinate axes are a , /3, y we can 




422 


CURVES AND SURFACES 


Ch. 14 


write 


T = cos a\ + cos /3 j + cos yk. 

If s is arc length measured along C in the positive direction from some fixed 

dx 

point, the direction cosines are given by cos a = — , and so on, so that 


T = ^T i + T'i + T~ k - (14.3-1) 

ds ds 3 ds 

If x, y, 2 are functions of t with continuous derivatives which are never all 
simultaneously zero, 

£-[(§)'+ (f HD']" 

with the plus sign chosen if 5 and t increase in the same sense along C; 
otherwise the minus sign is chosen. Since 

dx dx ds 
dt ds dt 

and so on, we see that T may found from the formula 


T = 



(dx . , dy . , dz \ 
\lF l+ dt s + di k } 


(14.3-3) 


together with (14.3-2). 

For many purposes it is convenient to discuss T entirely in vector notation, 
without the use of a co-ordinate system. If P(x, y, z) is a variable point on C, we 
denote the vector OP by R. In the co-ordinate notation 

R = xi + yj + zk. (14.3-4) 

Equation (14.3-1) may be written 



(14.3-5) 


This formula for the tangent vector can be derived directly by vector methods; 
we give the derivation, because it provides an instructive exercise in learning 
how to differentiate a vector function of a scalar variable. 

We consider R as a function of s. Let the points P,P' on C correspond to s 
and s + As, respectively. Write 

OP = R, OP f = R + AR, 

so that 


AR = PP'. 
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The quotient AR/As is a vector along the line of the chord 
PP \ See Fig. 126. Since the length of AR is the length of the 
chord PP we see that when P' approaches P the limit of 
the length of AR/As is unity. Furthermore, the limiting 
direction of PP' is that of the tangent at P. There- 
fore 


dR AR 

ds H As 


= T. 


Differentiation of R with respect to t gives 



Fig. 126. 


d R dRds ds 
dt ds dt dt 


(14.3-6) 


This is equivalent to formula (14.3-3). If t is the time variable, is the vector 
velocity of the point P moving on C. 


EXERCISES 

1. Let F and G denote vector functions of a scalar variable t. Assuming that F and G 
are differentiable, prove the formulas 

(a) l< F ‘ G >= F f + f G ’ 

(b) l (FxG > = Fx f + f xG ’ 

using the same method by which the rule for differentiating products is derived in 
elementary calculus. 

2. If F in Exercise 1 is a vector of constant length, prove that F • - 0, and thus 

conclude that — is perpendicular to F unless = 0. 


14.31 / PRINCIPAL NORMAL. CURVATURE 

In this section and the next we continue with the notations used in §14.3. We 
shall define two more unit vectors, the principal normal N and the binormal B, 
which, along with the tangent vector T, form an orthonormal set of vectors 
associated with the point (jc, y, z) on the curve C. These vectors, especially T 
and N, are important in the study of the motion of the point (x, y, z) along 
the curve. The acceleration vector lies in the plane of T and N. The component 
of the acceleration along the line of N will be shown to depend on the curvature of C. 
We shall assume that jc, y, z have second derivatives with respect to s. From (14.3-1) 
we have 


dT 

ds 


d 2 x . d 2 y 


ds 


i + 


ds 


J + 


dh 

ds 


k. 


(14.31-1) 



424 


CURVES AND SURFACES 


Ch. 14 


We shall prove that 



(14.31-2) 


This means that — is either 0 or perpendicular to T. The demonstration may be 
given in several ways. From (14.2-6) we have 


1 = 



Therefore, differentiating with respect to s , 


( dx d 2 x dy d 2 y dz d 2 z \ 
\ds ds 2 ds ds 2 ds ds 2 /‘ 


The right member of this formula is exactly 2T • —jj as we see from (14.3-1) and 

(14.31-1) and the formula for the dot product in terms of components. Thus 
(14.31-2) is proved. Alternatively (14.31-2) is a special case of the result proved 
in Exercise 2, §14.3. 
d T 

When -jj is not 0, its direction is perpendicular to T. A unit vector in the 
dT* 

direction of is called the principal normal to C at the point in question. The 

T 

principal normal is denoted by N, and the length of -jj is denoted by k. 
Therefore 



(14.31-3) 


The scalar k is called the curvature of C; it is given by 

j2_\2 / j2_. \ 2 / j 2_ \ 2-i 1/2 




(14.31-4) 


This definition makes the curvature always either positive or zero. In the 
case of plane curves it is easy to see, by sketching a figure, that the principal 
normal lies in the plane of the curve and points toward the concave side of the 
curve. In the theory of plane curves, the principal normal is called simply the 
normal 
d T 

If = 0, our definition of N breaks down. This happens at all points of C if 

C is a straight line, and in this case there is no basis for calling any particular 

d T 

normal the principal one. Otherwise the vanishing of is exceptional. 

The principal normal and the curvature play an important role in connection 
with the acceleration of a particle moving in a curve. 
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We saw earlier that the vector velocity is 

dR 

dt * 


v = 


The vector acceleration is therefore 


d\ d 2 R 
dt dt 2 ' 


Now, from (14.3-6), we see that 


Also, 


d 2 R = ds d T d 2 s 
dt 2 dt dt dt 2 


dT^dTds 
dt ds dt' 


Therefore, using (14.31-3), we see that 




dt z dt 


dt! 


(14.31-5) 


d 2 s 


The acceleration vector is made up of a component of magnitude 2 along the 

tangent vector, and one of magnitude k along the principal normal. 

The reciprocal of the curvature is called the radius of curvature p: 

1 


P = 


(14.31-6) 


If we write v = 

2 


for the speed of the particle moving along C, the component 


/ ds\ z v z 

k c an be written — . For motion in a circular path this is the familiar form 

of the centripetal acceleration. 

The tip of the vector pN, drawn from the point P as initial point, is called 
the center of curvature of C corresponding to P. 

The plane of T and N is called the osculating plane of C at P. 


14.32 / BINORMAL. TORSION 

The vector 


B = T x N (14.32-1) 

is called the binormal vector of C at P. Observe that T, N, and B are mutually 
perpendicular unit vectors; moreover, they form a right-handed system, just as i, 
j, and k do. The binormal B is perpendicular to the osculating plane. See Fig. 127. 
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There is a scalar quantity called torsion associ- 
ated with C at each point. To define torsion we 

consider From (14.32-1) we have (see Exer- 
cise 1(b), §14.3) 

^ TX ^ + ^ XN , 

as as as 


We know by (14.31-3) that 
dT 



ds 


x N = kN x N = 0, 


since the cross product of any vector with itself is zero. Therefore 

ds ds 


(14.32-2) 


Also, B • B = 1, since B has unit length. Therefore, by Exercise 2, §14.3, 


B • = 0. 

ds 


(14.32-3) 


Now, if ^ 0, the two preceding equations show that is perpendicular both 

to B and to T; it is therefore a multiple of N, for N is also perpendicular to both 
B and T. Thus we can write 

dB 


ds 


= -rN, 


(14.32-4) 


dB 


where — r is simply the proper multiple of N to give — . The quantity r thus 

dB 

defined is called the torsion of C at the point under consideration. If 0, we 

define r = 0, so that (14.32-4) still holds. 

For plane curves, B is constant, being a unit vector perpendicular to the 
plane. For such curves t = 0 at all points. A curve which does not lie in a single 
plane is called a twisted curve; its torsion measures, to some extent, the amount 
by which the curve is tv/isted. 

For methods of finding T, N, B, k, and t from the parametric equations of 
the curve, see Exercises 3, 4. 


EXERCISES 

1. If P and P' correspond to s and s + As, and if T and T + AT are the corresponding 
tangents to C, let A0 be the angle between these tangents. Show that 


r Ad 
lim - — = k. 

As— >-0 AS 


2. Prove that 


d N 
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Formulas (14.31-3), (14.32-4), and the foregoing formula are called the Frenet formulas, 
or the Serret-Frenet formulas. 

3. (a) Starting from (14.31-5), show that 


d 3 R_ 
dt 3 



am 2 <m n 

\dt) dt) 


+ KT 



Use the result in Exercise 2. 


(b) Show that 


d R 

dt 




(d R fPR 
V dt dt 2 



(c) Show that 


|R' X R"| (R' X R") - R 1 " 

K |R '| 3 ’ T |R'xR'f 


where primes denote differentiation with respect to t. In terms of x, y, z as functions of t, 
these formulas become 

[(y'z" - y"z ') 2 + (z'x” ~ z"x') 2 + (x'y" - x"y') 2 ] m 
K ~ (x' 2 +y' 2 +z ' 2 ) 3 ' 2 ’ 



T (y'z"- y"z') 2 + (z'x"- z"x ') 4 + (x'y"- x"y') 2 ' 

4. To find T, N, and B from the parametric equations of the curve, observe first of all 
d R ds 

that T is a positive multiple of -^-’provided — >0, i.e., provided the positive sense along 
the curve is the same as that in which t increases. Since T is a unit vector, it follows that 



Show with the help of Exercise 3(b) that 


R x R” (R' x IQ x R 

B |R' x R"| ’ N |(R' X R") X R'| ’ 

ds 

If f < 0, the formulas for T and B require a minus sign on the right, but the formula for 

N is unchanged. If T and B have already been found, N is easily found from N = B xT. 
5. (a) Find T, N, B, k, and t at t = 0 on the curve x = t, y = t 2 , z = t 3 . 

(b) Find k at z = 1 on the curve x = Vz, y = 2 Vz. 

(c) Find k and t at x = 3a on the curve 3ay = x 2 , 2xz = a 2 . 

(d) Find T, N, B, k, and r at t ~ it on the curve x = t - sin t, y = 1 - cos t, z = 4 sin(t/2). 

(e) Find k and t in terms of t for the curve x = cos t, y = sin t, z — e\ 

(f) For x = 3f — f 3 , y = 3 1 2 , z = 3t + t 3 , show that k = t = 3 ( 1 + 7 ^ * 

(g) Find T, N, B, k, and t for the conical helix x = 3t cos t, y — 3t sin f, z = 4t, at t = 0. 
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6. Consider the helix x = a cos t,y = a sin t, z = at tan a. It lies on the cylinder 
x 2 + y 2 = a 2 . It is a screw curve , with pitch angle a. As t increases, the point describes a 
right-handed screw path if 0<« <tt! 2, and a left-handed screw path if -7r/2<a <0. 

(a) Find the vectors T, N, B in terms of f. Show that T and B make constant angles with 
the z-axis, and that N is perpendicular to the 2-axis and directed toward it. 

(b) Show that 

cos 2 a sin a cos a 


(c) Show that the center of curvature of the helix, corresponding to the point on the helix 
at the tip of R, is at Ri = R+ pN, and hence show that the center of curvature moves on a 
helix whose pitch angle is (tt/2) - a and which lies on a coaxial cylinder of radius a tan 2 a. 
Show also that the center of curvature for this second helix is the original point on the 
original helix. 


7. Sh^>w that, if a point moves so that its velocity and acceleration vectors always 
have unit length, k = 1 at all points of the path. 

8. For the curve x = a sin 2t, y = a + a cos 2 1, z = 2a sin t (see Fig. 124) find T, N, B, 

k, and r (a) at t = 7r/2; (b) at t = 7t/ 4; (c) at t = 0. 


9. Suppose ]R| - a, so that the tip of R moves on a sphere of radius a with center at 
O. (a) Show that R = aN + /3B, where a and j3 are related to p and t by the equations 

a ~ -p, fir = (b) If t = 0, show that the tip of R moves on a circle about the line 

of B as an axis, (c) If 0, show that p 2 + \ = fl2 - 


14.4 / SURFACES 

Speaking roughly, a surface is a configuration of points having a two-dimen- 
sional character; that is, a point moving on the surface, but otherwise un- 
restricted, has two degrees of freedom. This description of a surface is not a 
strict mathematical definition, however, and we need much greater precision in 
the work which is to follow. 

Considerations about surfaces are of two kinds, which we may designate 
respectively as local and in the large. By a local property of a surface at a point 
we mean a property which can be completely described and analyzed by 
considering only the part of the surface in the immediate vicinity of the point. A 
property “in the large” is a characteristic of the surface as a whole, and cannot 
be defined merely in terms of the features of the surface near a single point. The 
property of having a tangent plane at a given point is a local property. A surface 
may have this property at some points but not at others. Being a closed surface 
is a property in the large (a spherical surface is closed, while a circular disk is 
not). There is clearly a difference in the large between the surface of a basketball 
and the surface of an inner tube. The mathematical name for the latter surface is 
torus. There is a classification of surfaces according to certain properties in the 
large and ignoring local properties. In this classification a spherical and an 
ellipsoidal surface are of the same type, while the torus is of a different type. 
Properties in the large are in many ways harder to study than local properties. 
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There are three common methods for the analytical representation of a 
surface. A locus in space defined by 

* = /(*, y), (14.4-1) 

where / is a single-valued continuous function defined on a region R of the 
xy-plane, is called a surface. In order to have the surface consist of one piece we 
assume that R is a connected region. A point (x, y, z) is called an interior point of 
the surface if the corresponding point (x, y) of R is interior to R. The points 
( x , y, z) corresponding to the boundary of R form the boundary or edge of the 
surface. 

Often we have surfaces represented by equations of the form 


F(x , y, 2 ) = 0. (14.4-2) 

If (jc 0 , yo, z 0 ) is a point of such a surface, we can in many cases represent the 
portion of the surface near (x 0 , yo, Zo) in a form analogous to (14.4-1) by solving 
(14.4-2) for x, y, or z in terms of the two other variables. 

For many purposes, especially in theoretical work, parametric representation 
of a surface is the most convenient. A locus defined by 

x=/(w, v), y = g(w, u), z = h(u, v), (14.4-3) 

where /, g, h are continuous functions defined on a connected region R of the 
nu-plane, is called a parametric surface. There may be other parametric 
representations of the same surface. This definition is so general that a 
parametric surface may not have any resemblance to what we would consider a 
surface to be, intuitively. It might, for example, intersect itself along infinitely 
many distinct curves in the vicinity of a single point. Hence, just as in dealing 
with curves in §14.1, we find it necessary to introduce more restrictive 
definitions. 

Suppose R is a closed rectangular region of the wu-plane: a^=u^b,c^=v^ 
d , and suppose the equation (14.4-3) define a continuous one-to-one mapping of 
R onto a point set S in xyz-space. Then S is called a simple surface element. The 
one-to-one condition means that distinct points of R are not mapped into the 
same point of S. A simple surface element may be thought of as any configura- 
tion which may be obtained from a rectangular plane region by continuous 
deformation (bending, twisting, stretching, shrinking) without tearing and 
without bringing together any points which were originally distinct. In parti- 
cular, a plane circular region is a simple surface element; so is a hemispherical 
surface. 

If S is a simple surface element corresponding to a rectangular region R in 
the wu-plane, the points of S which correspond to the boundary of R form what 
is called the boundary of S. Other points of S are called interior points of the 
surface element. 

The lateral surface of a right circular cylinder of finite length is not a simple 
surface element. It can, however, be regarded as composed of two simple 
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surface elements joined together along the lines AB and CD 
(see Fig. 128). One of the elements in AA\CDB\B ; the other is 

aa 2 cdb 2 b . 

All surfaces which we shall subsequently consider may be 
thought of as being built up out of simple surface elements by 
matching together portions of the edges of the latter. This 
matching is always done in such a way that never more than 
two surface elements have an arc of edge in common. 

An infinite number of surface elements will be required 
to form a surface of infinite extent, such as a paraboloid 
or a complete cylinder, but, in considering surfaces in 
bounded portions of space, we shall assume that such surfaces can be formed 
from a finite number of simple surface elements. The exact manner of drawing 
the curves which divide a surface into elements is not unique; the curves can 
always be chosen so as to avoid the neighborhood of any particular point. Thus, 
in studying a local property of a surface, we may always assume that we are 
dealing with an interior point of a single surface element. 

The boundary of a surface consists of the unmatched edges of its surface 
elements. If there are no unmatched edges, there is no boundary. A surface 
which has no boundary, and which lies in a bounded portion of space, is called a 
closed surface. For example, the surface of a sphere is closed, but the cylinder 
surface in Fig. 128 is not closed. 

A surface is called smooth at a point P if it has a tangent plane at each point 
in the neighborhood of P, and if the direction of the normal to this plane varies 
continuously from point to point. The whole surface is called smooth if it is 
smooth at every point. 

The surface represented by (14.4-1) is smooth if f has continuous first 
partial derivatives. The direction of the normal in this case is specified by the 
ratios 


D 



Fig. 128. 


dZ m dZ m 
dx ' dy ' 


(14.4-4) 


as we saw in §6.2. In the representation (14.4-2) the surface is smooth at a point 

dF dF dF 

where the three partial derivatives — , — , do not all vanish, provided they 

dx o y oz 

are continuous in the neighborhood of the point. The normal direction is 
specified by the ratios 

F, : F 2 : F 3 


(see Example 1, §6.6). 

In the parametric case (14.4-3) let us suppose that the functions /, g, h have 
continuous first partial derivatives. We introduce the notations 


j i = 


fl(y> z) 

d(u, v) 


. 8(Z, X) 

d(u, v)’ 


d(x, y) 
d(u, V) 


(14.4-5) 
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We assume that these three Jacobians do not vanish simultaneously. Under 
these conditions we can show that the surface is smooth, and that the normal 
direction is given by the ratios 

ii : J 2 : J 3 - (14.4-6) 

For the proof, suppose j 3 ^ 0. Then, by the implicit-function theorem 
(Theorem III, §8.3), the equations x - f(u, u), y = g(u , u) can be solved for m, v 
in terms of x, y. Substituting the solutions into z = h(u, u) we obtain z as a 
function of x, y. This means that the portion of the surface near a point where 
j 3 ^ 0 can be represented in the form (14.4-1). By the rules for differentiating 
composite functions we shall have 

dz dZ du . dZ dv ... . „ 

(14 - i * * 4 * ~ 7) 


with a similar formula for — . If we regard n, v as dependent and x, y as 

dy 

independent in the equations x = /(u, u), y = g(u, u), we have 

j = a/ du a/ 

dU dX dV dX 9 

0 = 

du dX dV dX 


Solving for — and — , we find 



1 

h 


/. 

1 


dU _ 

0 

g2 

dv _ 

g i 

0 


dX 

/ 1 

f 2 

dX 

/, 

/2 

' ? 


g 1 

g2 


g i 

g2 



du 1 dy dv _ 1 dy 

dX j 3 dv * dX j 3 dU 

The foregoing formulas are the same as some of those in (9.1-5), except for 
minor changes in notation. Hence, from (14.4-7) 

dZ_ _ \_ (dZ_ dy dz dy \ _ _ ji 

dX j 3 \du dv dV du; j 3 

Similarly, 

a — df du | df dv 
du dy dv dy 7 

i + 

du dy dv dy 9 


du 1 dX dv _ 1 dX 

dy ~ hdv 9 dy j 3 du ’ 


whence 
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and 

dZ _ 1 / dZ dx t dz __ _h t 
dy j 3 \ du dv dv du) J 3 

It follows that 

dx' dy' i 3 jV 

Thus (14.4-4) and (14.4-6) define the same direction; this is what we set out to 
prove. 

Once the direction of the normal is known, it is of course an easy matter to 
write out the equation of the tangent plane. 

EXERCISES 

1. Show that the parametric surface defined by x = a sin <$> cos 0, y = b sin <f> sin 6 , 

z-c cos <f>, 0 ^ 9 ^ 27 r, 0 ^ ^ tt, is an ellipsoid, and that it is a sphere if a = b = c. The 

surface is not a simple surface element, however. Which part of the definition of a simple 
surface element is not satisfied in this case? 

2. Explain how to divide the surface of a sphere into simple surface elements in 
several ways. In particular, if the sphere is x 2 +y 2 + z 2 = 1, describe a mode of division 
such that the points (0, 0, ± 1) are interior points of elements on which they lie. Describe a 
mode of division such that the points (±1,0,0) and (0, ±1,0) are interior points of the 
elements on which they lie. 

3. For the case of each of the following parametric surfaces, obtain an equation of 
the surface in rectangular co-ordinates. 

(a) x = au cos v, y = bu sin v, z = u (elliptic cone). 

(b) x — u cos u, y = u sin v, z = ku 2 (paraboloid of revolution). 

(c) x = a sin u cosh v, y = b cos u cosh v, z = c sinh v (hyperboloid of one sheet). 

(d) x = r cos 0, y = r sin 0, z - (r 2 /2) sin 29 (hyperbolic paraboloid). 

(e) x = au cos v, y — bu sin t>, z = u 2 cos 2v (hyperbolic paraboloid). 

(f) x = a cosh v, y = b cosh v cos u, z = c cosh v sin u. 

4. Show that the tangent plane to the ellipsoid (x 2 la 2 ) + (y 2 lb 2 ) + (z 2 /c 2 ) = 1 at 
(xo, yo, z 0 ) is (xox/a 2 ) + (y 0 y lb 2 ) + (z 0 z/c 2 ) = 1. 

5. Show that the direction of the normal to the surface in Exercise 3(e) is 
-2 bu cos v : 2 au sin v : ab. 

6. Describe the parametric surface x = a cos u, y = a sin u, z = v, and find its 
equation in rectangular co-ordinates. 

7. Describe the parametric surface x = 2u + v> y = u - u, z = 3u, and find its equation 
in rectangular co-ordinates. 

8. Show that the parametric surface x - u + v, y = u - v, z — 4u 2 is the parabolic 
cylinder z = (x - y) 2 . Show that the tangent plane at the point corresponding to (u, v ) is 
4ux -4uy - z - 4u 2 . 

9. If the curve y = /(x) in the xy-plane is revolved around the x-axis, show that the 
resulting surface can be represented parametrically in the form x = u, y = f(u) cos v, 
z =f(u)sin v. Assuming that f’(u) is continuous and /(w)>0, show that the direction of 
the normal is /'(h) : - cos v : - sin u. 
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10. (a) Consider the parametric surface x = u + u, y = u 2 +v 2 , z = w 3 +u 3 . Show that 
it is part of the surface x 3 - 3xy + 2z = 0, and, in fact, just the part for which x 2 ^ 2y ( u 
and v are required to be real, or course), (b) Find the direction of the normal to the 
surface at (0, 8, 0), using (14.4-4), and also using (14.4-6). (c) Show that the direction of 
the normals is 6uv : -3 (u + v):2. Is this correct even when u = u? (Check with the result 
obtained from the rectangular equation when u = v, i.e., when 2y = x 2 .) 


14.5 / CURVES ON A SURFACE 


Consider a fixed surface. We are going to study local properties of curves on the 
surface, so we shall find it convenient to deal with a single simple surface 
element. We suppose equations (14.4-3) give a parametric representation of the 
element, and that the functions /, g, h have continuous first partial derivatives. If 
u and v are made to depend continuously on a parameter t, then x, y, z will 
become continuous functions of t, and we shall have a curve defined on the 
surface. If we suppose u and v are continuously differentiable functions of t , 
any arc of the curve will be rectifiable. We wish to obtain a formula for the 
differential of the arc-length in terms of u, v, du , and dv. No matter what the 
parameter is, we have the following formulas: 

ds 2 = dx 2 + dy 2 + dz 2 , 

, dx , dX , 
dx~ — du + —dv 7 
du dv 

and similar formulas for dy and dz. Hence 

ds 2 = (S dM+ S dt, ) 2+ ---- 

On carrying out the details, we obtain the result 

ds 2 = E du 2 + 2F du dv + G dv 2 , (14.5-1) 

where 


F = 
G = 


dx dx t dy dy J dz dz 
du dv du dv du dv 


(—\ 2 + /fizV + ($*\ 2 

\du / Idu / \dv) * 


(14.5-2) 


We call attention to the structure of (14.5-1). It is a quadratic form in du and 
dv, with coefficients which are functions of u, v; these coefficients may be found 
directly from the parametric equations of the surface, as we see from (14.5-2). 
The quadratic form is called the first fundamental form of the surface. This form 
is of prime importance for what is called the metric differential geometry of the 
surface. 
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Fig. 129. 


Example. The parametric equations 

x = (a + b cos <£) cos 0 

y = (a + b cos <j>) sin 0 (14.5-3) 

2 = b sin 4> 

in which 0 < b < a, and 0 and <j> have the ranges 0^0^ 2i t, 0 S cf) ^ 2n, define a 
torus. The part of the surface of the torus in the first octant is shown in Fig. 129. 
This part is a simple surface element, corresponding to the 
ranges 0^0^ tt/2, 0 ^ ^ tt of the parameters 0, The 

entire torus is a closed surface, which may be divided into 
four simple surface elements, corresponding to the four 
squares in the 6(f )- plane shown in Fig. 130. 

Let us calculate the first fundamental form for the 
torus. We do this by direct calculation from (14.5-3), rather 
than by substitution in (14.5-1) and (14.5-2). Here 0 and <j> 
play the roles of u and v. We have 

dx = -(a + b cos <f>) sin 0 dO - b sin $ cos 0 d(j > , 

dy = (a + b cos <j>) cos d d6 - b sin <f> sin 0 dcf), 

dz = b cos <f) d<p. 

Hence, using the formula (14.2-6) for ds 2 , we have 
ds 2 = (a + b cos <£) 2 (sin 2 0 + cos 2 0) dd 2 

+ b 2 sin 2 </>(sin 2 0 + cos 2 0) d(j> 2 + b 2 cos 2 </> d</> 2 , 

ds 2 = (a + b cos <fi) 2 d6 2 + b 2 d(f> 2 . (14.5-4) 

Observe that there is no term involving dd d<j>. 
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When a surface is defined in terms of parameters u, u, the curves u = 
constant and v = constant are called co-ordinate curves. Along a curve v = 
constant (called a v -curve) we may consider u as a parameter of the curve. The 
direction of the tangent to this curve is given by the ratios 

dx m dy m dz 
du ' du ' du 

Likewise the direction of the tangent to a u -curve (n = constant) is 

dxdy dz 
dv ‘ dv ’ dv 

If the two sets of curves intersect orthogonally at each point, we see from 
(14.5-2) that F = 0. In this case, then, the first fundamental form becomes 

ds 2 = E dw 2 + G dv 2 , 

and no term in du dv is present. 

Along a u-curve ds = ±V E du , and along a u -curve ds = ±VG dv. Hence, if 

ds 

by direct geometric observation we can determine the values of along a 

u-curve, we can find E without resorting to the work of calculating from 
(14.5-2). Similar remarks apply to G. In the case of a number of familiar 
surfaces, these short cuts are quite convenient. 

These short cuts afe available in the case of the torus of the illustrative 
Example. In that example the ^-curves are circles in horizontal planes, with 
centers on the z-axis. The radius of a 4> -circle is a + b cos (f>, and hence 
ds = ±(a + b cos </>) dd along the circle. The coefficient of dd 2 in the general 
formula for ds 2 is therefore (a + b cos (j > ) 2 . Discussion of the 0-curves is left to 
the student. In this example the co-ordinate curves intersect each other at right 
angles. This accounts for the absence of the d6 d(}> term in the quadratic form. 

From the directions of the tangents to the co-ordinate curves we may 
determine the direction of their common perpendicular, .which is normal to the 
surface. Use of the usual scheme of three two-row determinants shows us that 
the direction of the normal is ji:j 2 :j 3 . Thus we obtain the result (14.4-6) in a 
different way. 

For certain purposes it is very convenient to use sets of variables with 
indices. Thus, we might write x u x 2 , x 3 instead of x, y, z, and Ui, u 2 instead of u , v. 
If this is done the formulas for F, F, G can be written compactly in summation 
notation; for example, 

p _ V* dXj dXj 
i = l du i dw 2 

The formula for ds 2 can be written 


ds 2 = 2 2 It! rdu a du,. 

a.0=l t=I 0U a OUp 
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In practice, the index notation is normally used when tensor methods are being 
employed. The conventions of tensor notation require the use of two kinds of 
indices, upper and lower , and for the present case the convention calls for the 
notations x 1 , x 2 , x 3 , u \ u 2 . The upper indices are not exponents, but merely 
distinguishing marks. The coefficients in the formula for ds 2 are now denoted by 
gn, gi 2, g 21, g 2i\ gn being the coefficient of du 1 du ] ; g l2 that of du 1 du 2 ; and so on. 
Thus 

2 

ds 2 = 2 gap d^du^ 

a, 0=1 

where 

— V dx' dx‘ 

8ae ~ itl du° dp ' 

Observe that g ] 2 = g2i- In our previous notation, g n = E, gn = F , g 2 2=G. The 
four quantities g n , gi2, g2i, g22 are the components of what is called the fun- 
damental metric tensor of the surface. In all this discussion we have not defined 
the term tensor. To do so is outside the scope of this book. Suffice it to say that 
the concept of a tensor is an extension of the concept of a vector. 

EXERCISES 

1. Describing the 0-curves in the torus of Fig. 129, and, by direct inspection, find the 
formula for ds along a 0-curve. 

2. Describe the 0-curves and ^-curves on the sphere x = a sin <j> cos 0, y = 
a sin (/> sin 0, z = a cos 4>, and find the formula for ds along each curve, thus obtaining the 
first fundamental form ds 2 = a 2 (sin 2 </> d0 2 + d(f> 2 ). 

3. Describe the u-curves and u-curves on the cylinder x = u, y = a sin v, z = a cos u, 
and find the general formula for ds 2 . 

4 . Proceed as in Exercise 3 with the cone x = u sin v, y = mu, z = u cos u. 

5. Calculate the first fundamental form for each of the following surfaces: 

(a) x — u cos v, y — u sin v, z ~ ku 2 . 

(b) x = a sin u cosh v, y ~ b cos u cosh u, z = c sinh v. 

(c) x = r cos 0, y = r sin 0, z = (r 2 /2) sin 20. 

(d) x = u + v, y = u - v, z = 4v 2 . 

(e) x = u + u, y = u 2 + v 2 , z = m 3 + v 3 . 

6 . (a) Let A = (7r/2) — <f> be the latitude of a point on the sphere of Exercise 2. Let a 
curve on the sphere be given. If o) is the angle at which the curve crosses the circle of A 
latitude, show that 

d\ 

— rr = tan (D 

cos A dd 

if a suitable convention is made about the sign of a). 

(b) If a) is constant, the curve is called a rhumb line , or a loxodrome. Find the equation in 

terms of A and 0 of the loxodrome through A = 0, 0 = 0 with w given (0 < < tt/ 2). Show 

that, although the curve spirals infinitely often about both poles of the sphere, its length is 
finite and equal to 7ra esc a>. 



14.6 


SURFACE AREA 


437 


14.6 / SURFACE AREA 

One of the first objectives in the study of surfaces with the aid of calculus is the 
derivation of formulas for calculating the area of a surface. There are various 
ways of getting at such formulas. We begin with a method which is frankly 
heuristic; that is, we formulate an argument which is plausible and which leads 
to a formula for the area as a double integral. This argument will not do, 
however, as the last word on the subject, for it is not based on an explicit 
definition of the area of a surface, but rather on our intuition of what ought to be 
true of such areas. 

Consider a simple surface element, as given by parametric equations (14.4— 
3), with the parameters (u, u) representing a point varying over the rectangle 
Ria^u^b, c^v^d in the uu-plane. Suppose that this rectangle is divided up 
into small cells by a rectangular partition after the fashion used in defining a 
double integral (§13.2). We shall suppose that the functions /, g, h entering in 
the parametric representation have continuous first partial derivatives in the 
closed region R. We shall use vector notation in what follows. We think of 
(14.4-3) as a vector mapping from the rectangle R into R 3 : 

F(u, u) = (/(n, d), g(n> v ), h(u, i>)). (14.6-1) 

By Theorem VII of Chapter 12, we know that F is continuously differentiable 
because we have assumed that the coordinate functions /, g, h have continuous 
first partial derivatives. Consider a typical cell in the interior of R and the 
corresponding cell of the surface element (Fig. 131a and Fig. 131 b). We first seek 
to find an approximate expression for the area of this small cell of the surface 
element. The guiding idea is to think of the cell on the curved surface element as 
being approximately a plane parallelogram. The plane of this parallelogram is the 
plane tangent to the surface element at one corner of the cell under con- 
sideration. The parallelogram itself is the figure into which the cell in R is 
mapped by a certain affine transformation which we obtain by using the 
differential of F. To explain this more clearly we must now introduce some 
additional notation. 

In Fig. 132a we designate the lower left corner A of the typical cell in R by 
(m„ v } ). Denote the other corners by B , C, D as indicated, so that B is (u, + A u, t>j) and 
C is (iij, Vj + Au). In Fig. 132b is shown the cell A'B'C'D' on the surface element, 



Fig. 131a. 


Fig. 131b. 
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Fig . 132a. 


Fig. 132b. 


the image of the cell ABCD under the mapping function F. In particular, A' is 
the vector F (u h v } ). 

To locate a point in the cell ABCD of Fig. 132a, we write its coordinates as 
(u, + 8u, Vj + Su), where 0 ^ 8u ^ Aw, 0 ^ 8v ^ Av. Now consider the affine map- 
ping from the ut>-plane into R 3 that carries ( Uj + 8u , v f + 8v) into 


F(u„ Vj ) + F'(Uj, Vj)(8u, 8v). 


(14.6-2) 


Here the derivative F'(M f , Vj), as we know from Theorem III in Chapter 12, is 
represented by the 3x2 matrix 



in which the partial derivatives are evaluated at (u h Vj). This matrix transforms 
the vector (Su, 8v ) into the vector 


F'( Ui , Vl )(8u, 8v) =(£su+^8v,^8u+^8v,^8u+^8v). 

(14.6-3) 

The affine mapping (14.6-2) maps the portion of the uu-plane near A into a 
portion of the plane tangent at A’ to the surface element in R 3 , and maps the 
rectangle ABCD into a parallelogram in the tangent plane, with AB mapping into 
A B", AC mapping into A'C", and so on. (Lines are mapped into lines because 
the mapping is affine.) That the rectangle ABCD maps into a piece of the tangent 
plane is because of the basic property of the differential F'(Ui, Vj)(8u, 8v), which 
is an approximation to the difference F (Uj + 8u, v t + St>) - F(«i, Vj). 

We shall need the following relationships: 

F '(Mi, vjXAu, 0) = ATT, F'(Ui, w,)(0, Av) = A/C". 

The first of these follows from the fact that, when 8u = Au and 8v = 0 in 
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(14.6-2), we obtain the affine mapping image of B , which is B" while 
the affine mapping image of A is A'. The vector difference is therefore A'B". The 
other relation is obtained in a similar way. Therefore we have, from (14.6-3), 




Now, the area of the parallelogram A'B^C"D" is equal to the magnitude of the 
cross product of the vectors AT?" and A’C". This cross product is, [see (10.2-7)], 


A'B"x A'C" = 


or, in the notation of (14.4-5) 


j k 


f An 

^-Au 

dh 

Am 

du 

du 

du 


fi„ 

7T Av 

dh 

Ad 

dv 

dv 

dv 



A'B" x A'C" = j*! Am Adi + j 2 Am Adj + j 3 Am ADk, 
so that the area of the parallelogram A'B"C"D" is 

Vjj + jl + jl Am Ad. 


(14.6-4) 


Here the Jacobians jj, j 2 , J3 are evaluated at (m ; , d,). 

Our plausibility argument now continues by using the expression in (14.6-4) 
as our approximation to the area of the cell A'B'C'D' on the surface element. 
The next step in the process is to form the sum of all the terms of the form 
(14.6-4), corresponding to all of the cells into which R is divided in Fig. 131a. 
When we form this sum and then find its limit as the partition of R is refined, we 
obtain a double integral as the limit. This double integral is 


S = JJ Vj? + j\ + j] du dv. (14.6-5) 

R 

We are led in this manner to define by this double integral the area S of the 
surface element into which R is mapped by (14.6-1). 

This formula can be given another appearance, using the coefficients F, F, G 
from the quadratic form expressing ds on the surface in terms of du and dv (see 
(14.5-2)). The alternative expression for the double integral is 


S = J J VEG - F 5 du dv. 

R 


(14.6-6) 


One way of passing from (14.6-5) to (14.6-6) is to make the straightforward 
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calculation showing that 

EG - F 2 = j? + )\ + J3. (14.6-7) 

In working problems it will sometimes be found to be easier to compute 
ji + J*2 + J*3 than EG - F 2 and vice versa. 

Example 1. Compute the total area of the torus x = (a + b cos <f>) cos 0, 
y = (a + b cos cf>) sin 6 , z = h sin </>, 0 < b < a. 

This torus was discussed in §14.5 (see Fig. 129). The part in the first octant is 
a simple surface element corresponding to 0 ^ 6 ^ 7t/2, 0 ^ ^ 77 , and the area 

of this part is one eighth of the total. From (14.5-4) we see that, if u = 0 and 
v = </>, 

E = (a + b cos </>) 2 , F = 0, G = b 2 . 

Thus VEG - F 5 = h(a + b cos <£), and the total area is 

r tt /* trfl 

S = 8 d(f> \ b(a + b cos <£) dd = 47r 2 ah. 

Jo Jo 

This is in accord with the theorem of Pappus. 

If we use (14.6-5) instead of (14.6-6) to calculate the first octant portion of 
the torus, we find that 

ji = b(a + b cos <£>) cos 0 cos <f > , 
j 2 = b (a + b cos </>) sin 0 cos </>, 
j 3 = b (a + b cos <^>) sin </>, 


from which 


j 1 + J 2 + J 3 = b\a + b cos <^>) 2 . 


Thus, the identity (14.6-7) is verified in this particular case, and the integral for 
the area is calculated as before. 

In the argument leading up to (14.6-5), it will be seen that the fact that the 
region R was a rectangle in the mu - plane was not essential. If R is any bounded 
closed region of the mu -plane of the type described in the discussion of double 
integrals in §13.2, the discussion leading up to (14.6-4) applies to any cell in a 
rectangular partition of the type shown in Fig. 131a and (14.6-4) gives the area 
of the parallelogram which is the image of this cell under the affine mapping 
(14.6-2). Hence, the formulas (14.6-5) and (14.6-6) can be used to find the area 
of any portion of a smooth surface which is obtained by a one-to-one and 
continuously differentiable mapping from the region R in the Mu-plane. 

We now consider the special case of a surface defined by an equation 
z = f(x, y) for all (x, y) belonging to some region R in the xy-plane. It will be ass- 
umed that / has continuous first partial derivatives in R. We can think of x and y 



14.6 


SURFACE AREA 


441 


as being the parameters u, v. This leads us to the following very special case of 
(14.4-3). 


x = x, y = y, z = /(x, y). (14.6-8) 

By simple calculations we see from (14.4-5) that 

. _ df . _ df . _ t 

}3 ~ l - 

Consequently, the area of the portion of the surface corresponding to the plane 
region R is 




(14.6-9) 


Alternatively, if we wish to use (14.6-6), we can calculate as follows: 

, dZ , dZ , 

dz = T* dx+ T y dy ' 

ds 2 = dx 2 + dy 2 + (|^ dx + ~ dy ) 

= [ 1 + (5)1 dx2 + 2 S f - dx dy + [ 1 + (1)1 dy2 - 


Interpreting x as u and y as v, we have 


E = 



dZ dZ ^ 

dx dy 


Hence the area of the surface is 


G = 



i? 

Example 2. Find the area of the upper half of the sphere x 2 + y 2 + z 2 = a 2 by 
using formula (14.6-10). 

Here 


z 


Vfl 2 -x I -V, 


az -x 

ax Va 2 -x 4 - y 2 


^2 

with a similar formula for — . Thus the integrand in (14.6-10) becomes 

ay 


1 + 


a 2 - x 2 


y 2 1 1/2 

a 2 - x 2 - y 2 J 


Va 2 -x - y 


There is one difficulty. The hemisphere lies above the region R bounded by the 
circle x 2 + y 2 = a 2 in the xy-plane, and we see that the partial derivatives of z 
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with respect to x and y are not defined on the boundary of this region. 
Moreover, the integral (14.6-10) in this case is 


R 

this is an improper integral, since the integrand becomes infinite at the boundary 
of R. We can get around the difficulty by considering the area above a smaller 
concentric circle of radius b, and then making b -» a afterward. It is convenient 
to evaluate the double integral by an iterated integral in polar co-ordinates. The 
result is 


S = 


lim 

b-*a 



= lira 


r r dr 


or S = lira 2 . This is correct for the hemisphere. 


We may put formula (14.6-10) in a different form. Let a , /3, y be the angles 
which the normal to the surface z = f(x, y) makes with the positive axes, the 
positive direction of the normal being chosen so that y is acute. Now 


O dZ dz 1 

cos a : cos B : cos y = — : — : — 1 . 

dxdy 


Hence 


sec^ 


and so from ( 14.6—10) we have 


(dz\ 2 ,(dz\\. 

y= \te) + W; + 


S = j J sec y dA. 


(14.6-11) 


(14.6-12) 


Let R be a connected region. Applying the mean-value theorem to formula 
( 14.6—12), we conclude that there is a point P' on the surface such that, if y' is 
the value of y at P' and A is the area of R, then S = A sec y' . If a cylindrical 
surface parallel to the z-axis is constructed around the boundary of R, and if the 
plane tangent at P' to the original surface is constructed, then 
the area of this plane which is cut off within the cylinder is 
exactly A sec y ' (see Fig. 133). Hence, the area of the original 
surface z = f(x, y) is the same as the area cut from a certain 
one of its tangent planes by the cylinder parallel to the z-axis 
and intersecting the xy-plane in the boundary of the region R. 

If the region R is very small, it follows that A sec y is a good 
approximation to S, no matter at which point P of the surface 
we evaluate y. This remark is sometimes taken as the 
intuitive foundation for a derivation of formula (14.6-2), the 
procedure being to subdivide R into small parts, the area S Pig- 133 
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being obtained as the limit of the sum of areas A A sec y. It is this derivation 
which is usually found in elementary calculus textbooks. 

Formulas (14.6-6) and (14.6-12) are the standard formulas of calculus for 
dealing with surface area. Where, however, is the definition of surface area? Are 
there surfaces which have area, and yet which are such that the area cannot be 
found by the integrals mentioned above, perhaps because of lack of sufficient 
smoothness? It is logically and aesthetically desirable to have a definition of 
surface area which is directly geometric, and which does not put too many 
restrictions on the surface. A good definition ought not to depend upon the 
method of representing the surface analytically, and should not be limited to 
smooth surfaces. The demand for such a definition poses a very difficult 
problem, however. It may surprise the student to know that the problem has 
occupied the attention of many able mathematicians over the last fifty years, and 
that the end of research on the question is not yet in sight. 

To present the concept of surface area to the student at the advanced 
calculus level, the most satisfactory logical approach seems to be the following: 
for a smooth simple surface element, with appropriate conditions on its 
parametric representation, formula (14.6-6) is to be taken as a definition; the 
discussion leading up to the formula is by way of motivation. It can be shown 
that the area so defined is independent of the particular parametrization, and is 
therefore an intrinsic characteristic of the surface. This demonstration requires 
the theory of transformation of double integrals, and is discussed in Chapter 15. 
In the particular case of surfaces z = /(x, y), the formula (14.6-12) makes it clear 
that the area does not depend on the parametrization of the surface, but it must 
still be shown that the orientation of the z-axis is inessential, since the direction 
of this axis plays a role in the formula. 


EXERCISES 

1. Find the area of a sphere, using the parametric representation 

x = a sin (j> cos d , y = a sin <j> sin 0, z~ a cos <f>. 

2. Find the area of the part of the cylinder x 2 +z 2 =a 2 inside the cylinder y 2 = 

a(x + a). 

3. Find the area of the part of the cone x 2 + y 2 = z 2 inside the cylinder x 2 + y 2 = 2ax. 

4. Find the area of the part of the surface z = xy inside the cylinder x 2 + y 2 = a 2 . 

5. Find the area of the surface element x = au cos u, y = bu sin v, z = 

|( u 2 a cos 2 v 4- b sin 2 v), O^wgl, 0 ^ v ^ 2tt. Identify the surface and the portion of it 

whose area is found. 

6. A part of the surface z 2 = 2xy can be parametrized by x = u 2 , y = v 2 , z = 

V2wu. (a) Find the area of the part of the surface above the rectangle O^x^u, 
0 ^y^b. (b) Find the area of the part of the surface above the region in the xy-plane 

between the xy-axes and the curve x 1/2 + y ,/2 = 1. Compare the solutions by (14.6-12) and 
(14.6-5). 

7. Find the area defined by x = r cos 0, y = r sin 0, z = 0, 0 ^ r ^ 1, 0 ^ 0 ^ 27 t. 
Describe the surface. 



444 


CURVES AND SURFACES 


Ch. 14 


8. Find the area of the spheroid x = a sin (p cos 8, y = a sin <f> sin 8, z = c cos <p, 
distinguishing the cases a ^ c > 0, c>a>0. 

9. Find the total area of the part of the sphere x 2 +y 2 + z 2 = 4 a 2 inside the cylinder 
-x 2 H- y 2 = 2 ay, (a) using (14.6-12) and polar co-ordinates, (b) using the type of 
parametric representation occurring in Exercise 1. In (b) the main problem is to find the 
proper region in the 0</>-plane to correspond to the first octant portion of the required 
area. 

10. Find the total area of the part of the cylinder x 2 -\-z 2 - a 2 inside the cylinder 
x 2 + y 2 = ax. 

11. Find the area of the portion of the surface y 2 + z 2 = 4ax in the first octant, 
between x = 0 and x = 3 a, and inside the cylinder y 2 = ax. Solve (a) by using (14.6-12), and 
(b) by using the parametric representation x = r 2 (4/a), y = r cos 0, z = r sin 0. Show that 
the latter method is equivalent to using the counterpart of (14.6-12) for projection on the 
yz-plane, and then introducing polar co-ordinates to do the integration. 

12. Show that the area on the sphere x 2 + y 2 + z 2 = c 2 and inside the paraboloid 
(x 2 /a)+ (y 2 fb) = 2 (z 4- c) is 4 ircVab, provided that 0 < b ^ a ^ c. 

13. Prove the equality in (14.6-7). 

14. On the surface z=f(x,y) consider the locus of points where the angle y is 
constant. Suppose the projection of this locus on the xy-plane is a closed curve bounding 
an area A = <p(y). Show that it is plausible to think that the area of the part of the surface 

on which yo = y ~ yt (where 0 ^ yo = yi = ir/2) is f sec y<fi'{y) dy. 

Jjo 

Check by applying to the hemisphere z = Va 2 -x 2 - y 2 . 

15. Find the area of the portion of the cylinder surface y 2 + z 2 = a 2 which is in the 
first octant and inside the cylinder (x-a) 2 +y 2 = a 2 . Suggestion: As one convenient 
possible parametrization use u = 8, v = x, where y = a cos 0, z = a sin 0. From symmetry 
it may be seen that one half of the desired area comes from the part of the cylindrical 
surface corresponding to the part of the 0x -plane defined by a(l - sin 0) ^ x < a, 0 < 0 < 
77 / 2 . 



15 / LINE AND SURFACE 
INTEGRALS 


15 / INTRODUCTION 

In this chapter we consider some new concepts, the concepts of line integrals 
and surface integrals. These new kinds of integrals will be defined as limits of 
sums in the same general way that single and double integrals are defined. An 
ordinary single integral 



is an integral of a function which is defined along a line segment (an interval of a 
co-ordinate axis). There is a corresponding kind of integral for a function which 
is defined along a curve. Such an integral might well be called a curvilinear 
integral; the usual name is line integral , where line means, in general, a curved 
line. Likewise, the concept of a surface integral is a generalization of the concept 
of a double integral. A double integral 

1 1 /(*, y ) dx dy 

R 

is an integral of a function which is defined on a region R in the xy-plane. A 
surface integral is an integral of a function which is defined on a surface. The 
double integral is a particular case, for the plane region R is a flat surface. 

These new kinds of integrals have important applications to geometry and 
physics. They are also tools of great usefulness in analytical reasoning. 


15.1 / POINT FUNCTIONS ON CURVES AND SURFACES 

In §10.5 we introduced the concept of a scalar point function, the essential idea 
of which is that of considering the function values as associated with a point 
rather than with the co-ordinates of that point. A notation such as /(P) signifies 
the value of the function / at the point P. Now a point function may be defined 
throughout some region of space, or merely on a curve or a surface. Thus, for 
instance, the curvature of a curve is a scalar point function defined only at points 
on the curve, while a unit vector normal to a smooth surface is a vector point 
function defined only on the surface. 

We shall need the concept of continuity for point functions defined merely 
on curves or surfaces. Let /(P) be defined on a curve C, and let P 0 be a fixed 
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point of C. We say that / is continuous at P 0 if 

Mm /(P) = /(P 0 ), 

it being understood, of course, that P can approach P 0 in any manner consistent 
with its being on C. The arithmetical form of this definition is as follows: / is 
continuous at Po if to each positive number e corresponds some positive number 
8 (depending in general on P 0 and e) such that |/(P)-/(P 0 )| < e whenever P is 
on C and at distance less than 8 from P 0 . The function is called continuous on C 
if it is continuous at each point of C. A similar definition is made for continuity 
on a surface. 

A point function may, of course, be expressed as a function of co-ordinates 
when a co-ordinate system is introduced. 


15.12 / LINE INTEGRALS 

Let C be a curve in xyz-space. It may in particular be a plane curve, as for 
instance a curve in the xy-plane. There are two directions along a curve; we may 
arbitrarily choose to call one of these directions the 
positive direction, and the opposite direction the nega- 
tive direction. When this choice has been made we say 
the curve is oriented , or given an orientation. If an arc is 
oriented, it has an initial point A and a terminal point B. 

An oriented simple closed curve has no initial or 
terminal point, but it is often convenient to select some 
point of the curve and regard it as both the initial and Fig. 134 . 
terminal point of the curve (see Fig. 134). 

In what follows we shall limit our discussion to curves which are formed by 
joining a finite number of arcs end to end. Such a curve may intersect itself a 
great deal. If it does not intersect itself at all, it is either a simple closed curve or 
else it may be regarded as a single arc. 

Let C be an oriented curve with initial point A and terminal point B (A and 
B may coincide, as in Fig. 134). Let F denote a scalar point function which is 
defined and continuous along C. Let points P 0 , Pi, ...» P„ be chosen in order 
along C, with A = P 0 , P n - B. Let Q k be any point of C between and P k (see 
Fig. 135). In a given rectangular co-ordinate system let the co-ordinates of P k be 
tec, y k, Zjc), and let = x k — x k -\. Form the sum 

SF(Qk) Ax k ; (15.12-1) 

k=l 

if the sums of this sort have a limiting value as n->°° and the greatest of the 
chord lengths P 0 Pi, . . . , P n -iP n approaches zero, we denote the limit by 

J F(P) dx, 




15.12 


LINE INTEGRALS 


447 



Fig . 135. 


and call it the line integral of F with respect to x along C. If P is the point 
(x, y, z), and if F(P) is denoted by f(x , y, z), an alternative notation for the line 
integral is 

J c f(x, y, z) dx. 

Line integrals with respect to y or z are defined in the same way, with Ay k or Az k 
replacing Ax k in (15.12-1). 

To compute the value of a line integral, we use some parametric represen- 
tation of the curve C. Suppose the parametric equations of C are 

x = A(0, y = ix (t), z = v(t), atktHkb, 

and suppose that x , y, and z have continuous derivatives with respect to t. We 
further suppose that the points A and B correspond to t = a and t = b, 
respectively, and that (*, y, z) traces out C from A to B as t goes from a to b. 
Let the points P k on C correspond to points t k such that a = t 0 <*!<•••< t„ = 
b ; let At k = t k ~ t k - h and let Q k correspond to t k , where t k -i = ffc=t k . The sum 
(15.12-1) now takes the form 

S /(A(t0, A(t*_,)]. (15.12-2) 

k= 1 

By the law of the mean, 

A(t k ) — A(f k -i) = A'(r k ) A t k , 

where r k is some number between t k - 1 and t k . Therefore 

J c f(x, y, z)dx = J f(\(t), n(t), v(t))K'(t) dt. 

In drawing this conclusion we use a standard theorem about definite integrals; 
this theorem appears as (18.21-4), §18.21. It is a special case of Duhamel’s 
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principle. This argument shows that the limit defining the line integral exists 
provided C has a parametric representation in which x , y, z have continuous 
derivatives with respect to t. 

The result may be written 

J c f(x,y,z)dx = J f(x,y,z)(~jdt, (15.12-3) 

where the integral on the right is an ordinary definite integral of a function of f, 
whose integrand is found by expressing x, y, z in terms of t from the parametric 
equations of C. 

Example 1. Find the values of 


(a) J c (xy + y 2 -xyz) dx and ( b ) J^(x 2 -xy)dy 

if C is the arc of the parabola y = x 2 , z = 0 from ( — 1, 1,0) to (2, 4, 0), shown in 
Fig. 136. 

Here we use x as the parameter. The integral (a) 
becomes 

J c (xy + y 2 - xyz) dx = J (x 3 + x 4 )dx = t + f = 20 ' 

For ( b ) we have dy/dx = 2x, and so 


Jc (x 2 - xy ) dy = J (x 2 - x 3 )2x dx = y - f = 



It is not essential that the parameter be increasing as 
we go along the curve in the positive direction. 

Example 2. Consider the first quadrant arc C of the circle x 2 + y 2 = 1 in the 
xy-plane, oriented positively in the direction from (0, 1) to (1,0) (see Fig. 137). 
With the parametric representation x = cos 0, y = sin 0, the 
initial point of C corresponds to 0 = tt/2 and the terminal point 
to 0 = 0. In evaluating a line integral, the lower limit of 
integration will be 0 = tt/2 , and the upper limit 0=0. For 
instance, dy = cos 0, and so 


f x 2 ydy = f 

Jc Jnl : 


cos 0 


cos 2 0 sin 0 cos 0 d0 = — . 

w/2 4 


tt/2 



The value of a line integral depends on the orientation which is assigned to 
the curve; if the orientation is reversed, the value of the integral is replaced by 
the negative of its former value. This is because the limits of integration are 
reversed in (15.12-3). 

The value of a line integral does not depend upon the particular parametric 
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representation of the curve which is used to calculate the value of the integral. 

A sum of line integrals with respect to x, y, and z is often written with just 
one integral sign. Thus, 


/(x, y, z ) dx + g(x, y, z) dy + h(x, y, z) dz 

means 

J ( /(x, y, z) dx + Jc g(x, y, z ) dy + h(x, y, z ) dz. 

Example 3. Compute the value of 

xz dx + x dy - yz dz 
Jc 

along the oriented curve shown in Fig. 138, consisting 
of a quarter circle in the xz-plane, and line segments in 
the xy-plane and yz-plane, respectively. Denote the 
three parts of C by C\, C 2 , C 3 , respec tively . On C x we 
choose x as parameter. Then z = V 1 - x 2 , y = 0, so 
dy = 0, and 

xz dx 4* x dy — yz dz — I xz dx + x • 0 - 0 • dz 
Jc, Jc, 

= f xV 1 - x 1 dx = g* 

Jo 

For C 2 we use y as parameter; the equations are 
x = 1 - y, z = 0; so dz = 0, and 


(15.12-4) 



xz dx + x dy - yz dz = 0 • dx + x dy - 0 • 0 

Jc 2 Jc 2 


C, \ 


l? 


U'l , 


= f(l-y)dy = l „ . ,, 

JO ^ -f>v. j 

Finally, using z as parameter on C 3 , we have x = 0, y = 1, dx = 0, dy = 0, and the,*, 
integral over C 3 is just 3 

- yz dz = I — zdz = —y 
Jc 3 Jo 

Thus the line integral (15.12-4) has the value 




Iii 1 — 1 

3 + 2 2 3 « 


EXERCISES 

1. Find the values of the following line integrals. All the curves are in the xy -plane 

(a) jc y 2 dx-x dy , along y 2 = 4x from (0, 0) to (1, 2). 

(b) fc-ydx-hx dy, along y 2 = 4x from (4, 4) to (0, 0). 
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(c) f c (4x - y) dx, along y = Sx - 2x 2 from (4, 0) to (0, 0). 

(d) f c ( 4x - y) dy along the curve of part (c). 

//(e) } c x 2 dy, along the curve y = jc 3 - 3jc 2 + 2jc from (0,0) to (2,0). 

(f) Jc x dy - y dx , along 2y = 3 jc + 2 from (2, 4) to (4, 7). 

(g) Jc(y - x) dx + x 2 y dy, along y 2 = x 3 from (1,-1) to (1, 1). 

(h) Jc (x 2 - y 2 ) dx + x dy, along the first quadrant arc of x 2 + y 2 = 4, from (0, 2) to (2, 0). 

2. Find the values of the following line integrals: 
tK(a) Sc x 2 y 2 dx + jcy 2 dy, counterclockwise around the closed curve formed by parts of the 
line x = 1 and the parabola y 2 = x. 

7(b) {c (x 2 - y 2 ) dx -\-2xy dy, counterclockwise around the square formed by the lines 
x = 0, x = 2, y = 0, y = 2. 


</«> j 




dy, counterclockwise around the circle x 2 + y 2 = a : 


<cx + y J 

/d) /cx 2 ydx, counterclockwise around the circle jc 2 + y 2 = a 2 . 

(e) Sc ~ 3y dx + 2x dy + 4 z dz, around the circle x 2 + y 2 = 1 , z = 1 , in the counterclockwise 
sense as viewed from (0, 0, 2). 

(f) Sc ( x 2 - y 2 ) dx + x dy, in the counterclockwise sense around the circle x 2 + y 2 = 4. 

/g) Jc (xV3 - y) dx + (y V3 + x) dy, counterclockwise around the circle x 2 + y 2 = 1. 

(h) Jc ~y 2 dx + x 2 dy, counterclockwise around the closed curve formed by the upper 
half of the ellipse (x 2 la 2 ) + (y 2 lb 2 ) = 1 and the jc-axis from x = - a to x = a. 


°^3. Find the value of | — 
JcX 


^dx+-2^— 

y * +y 


2 dy in each of the following cases: 


4a) If C is the counterclockwise arc of x 2 + y 2 = 2 from (1, 1) to (-V2, 0). 

(b) If C is the line x = 1 from (1,0) to (1, V3). 

(c) If C is the line x + y = 1 from (0, 1) to (1, 0). 

l/ 4. Calculate Jc y dx - x dy + dz, where C is the arc of the helix jc = a sin t, y — 
a cos t, z = t, from t = 0 to t - irj2. 

5. Calculate J c Vy dx + 2x dy + 3y dz, where C is the arc of x = t, y = t 2 , z - t 3 from 
t = 1 to t = 2. 


6. Calculate 



dx from (0, 0, 1) to (V2/2, V2/2, 0) along the first-octant 


part of the curve of intersection of the plane x = y and the cylinder 2y 2 + z 2 = 1. 

7. Calculate Jc (z/y) dx + (jc 2 + y 2 + z 2 ) dz from (0,1,4) to (1,0,6) along the first- 
octant part of the intersection of jc 2 + y 2 = 1 and z = 2x +4. 

8. Calculate Jc y dx - y(jc - 1) dy + y 2 z dz along the first-octant part of the curve 
x 2 + y 2 4- z 2 = 4, (jc - l) 2 + y 2 = 1 from (2, 0, 0) to (0, 0, 2). 


9. Calculate each of the integrals 

(a) Sc z 2 dx, (b) Sex 2 dy, (c)J c y 2 dz from (2,0,0) to (0,4/V3, 2/V3) along the first- 
octant part of the ellipse defined by x 2 + y 2 — z 2 = 4, 2 z = y. 

10. Consider the integral Jc (2 + y) dx + (x + z) dy + (y + x) dz, where C is 
(a) The broken line joining (0, 0, 0), (1, 0, 0), (1, 1, 0), and (1, 1, 1) in that order. 


(b) The straight line from (0, 0, 0) to (1, 1, 1). 

(c) The broken line joining (0,0,0), (0,0, 1), and (1, 1, 1) in that order. 

Calculate the line integral in each case and show that the values are all equal. 
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11 . Prove that the line integral in Exercise 10 has the same value for all curves C 
with initial point at (0, 0, 0) and terminal point at (1, 1, 1). Hint: If the curve is expressed 
in terms of a parameter t , consider F'(0» where F(() = xy + yz + zx when x, y, z are 
expressed in terms of t. 

12. Let C be the clockwise closed curve bounded by the lines x = a, x = b, the 
x-axis, and a curve y = /(jc), a ^ x ^ b, assuming that a <b and that /(x) is continuous 
and never negative. Using results from elementary calculus, 
show (a) that f c ydx is the area enclosed by C; (b) that 
Jc Try 2 dx is the volume generated when this area is revolved 
around the x-axis; (c) that J c xydx is the first moment of this 
area with respect to the y-axis; (d) that Jc\y 2 dx is the first 
moment of the area with respect to the x-axis; (e) that / c x 2 y dx 
is the second moment of the area with respect to the y-axis. It can 
be shown later, after we have learned more about line integrals, 
that these same interpretations may be made for the foregoing line 
integrals if C is any sectionally smooth, simple closed curve in the 
xy -plane (except that in (b) we must require that the curve lie U| 

entirely on one side of the x-axis). 

Fie. 139, 

13. Using Fig. 139 explain why it appears correct to say that, 

if C is a simple closed curve oriented counterclockwise, f c xdy is equal to the area enclosed 
by C. 

14. Using Fig. 139 as a guide, set up a line integral with respect to y, giving the 
volume of the solid generated when the area enclosed by C is revolved around the x-axis. 
Assume, as in the figure, that the curve lies entirely above the x-axis. 


V 



15.13 /VECTOR FUNCTIONS AND LINE INTEGRALS. WORK 

Consider a line integral of the form 

J Pdx + Qdy + Rdz, (15.13-1) 

where P, Q, R are continuous functions defined along a certain oriented curve C. 
Such integrals often occur in connection with vector point functions, and we 
shall now indicate how the integral (15.13-1) can be expressed in a different 
notation by the use of vectors. 

Let 


F(x, y, z) = Pi + Qj + Pk 

be the vector function defined at each point of C in such a way that P, Q, R are 
its components in the xyz-co-ordinate system. Let s denote arc length along C, 
with s = 0 at the initial point of C and s = l at the terminal point. We assume 
that C is smooth. Then the unit vector tangent to C in the positive direction at a 
given point is 
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Therefore, by the formula for dot products, 


F ■ T = P -j- + Q R -t~ 
as ds ds 


But, if we use s as the parameter along C, we know by (15.12-3) that 


with similar formulas involving Q and R. 
Therefore, we see that 


f Pdx + Qdy +Rdz= f (F ■ T) ds. (15.13-2) 

Jc Jo 


Let 4* be an angle between the vectors F and T, with 
the understanding that 0 ^ ip ^ it ; see Fig. 140. Since T has 
unit length and the length of F is 


(P 2 +Q 2 + P 2 ) 


2\ll2 


we know that 


F ■ T = (P 2 + Q 2 + P 2 ) 1/2 cos ip. 



Therefore 


Fig. 140. 


J Pdx + Qdy +P dz = j (P 2 + Q 2 + P 2 ) 1/2 cos tyds. (15.13-3) 

From this formula we can get a useful inequality concerning the value of the line 
integral. Let Af be the maximum length of the vector F. Then 

|(P 2 + Q 2 + R 2 ) m cos 4f\^M 


at all points of C, and so 


I L 


Pdx + Qdy +Rdz 


^ ML 


All these considerations apply to integrals of the form 


(15.13-4) 


J Pdx + Qdy 

where P and Q are continuous functions defined along a curve C in the 
xy-plane. Here we can think of P and Q as components of a vector F lying in 
the xy-plane. 

The line integral notation is often used for integrals with respect to s, if s is 
arc length in the positive direction along C. If g is any function of s, a commonly 
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used notation is to write 

g ds instead of g ds. 

Jc Jo 

Example L Let C be the semicircle (x - a) 2 + y 2 = a 2 , y ^0, from (2a, 0) to 
(0, 0). Let F be a vector of constant length (||F|| = c ) directed from Qc, y ) toward (0, 0). 
Calculate 

Jc F ‘Tds. 


In this case if/ = (7t/2) - (<£/ 2) (see Fig. 141) and F • T = c cos i// = c sin (</>/ 2). 
Also, s = a<£, so 

f F • T ds = f c sin — ^ a dcf) = 2ac. 

Jc Jo 2 

One of the important physical applications of 
line integrals is to the concept of work in analytic 
mechanics. The most elementary definition of 
work is that which is given when a constant force 
acts on a particle while the particle moves in a 
straight path along the line of action of the force. 

In this case the work is defined as the product of the magnitude of the force and the 
distance traversed. In elementary calculus this definition is generalized to cover the 
situation of a force of variable magnitude, the motion still being in a straight line. 
The work is then given by a definite integral. 

A general definition of work can be made in terms of a line integral. Suppose 
a particle moves along a curve C, and that while so moving it is acted on by a 
force F which may vary both in magnitude and direction. The work done by this 
force is defined to be 



W = J^F-Tds. (15.13-5) 

It is seen from Fig. 140 that this definition is in accord with physical intuition. 
Among other things we see that the component of F perpendicular to C 
contributes nothing to the work, which is all done by the tangential component. 
Also, (15.13-5) agrees with the more elementary definitions already referred to in 
the appropriate special cases. 

If the x, y , and z components of F are F h F 2 , F 3 , respectively, the work can 
be expressed as the line integral 


W = 


L 


Ft dx + F 2 dy + F 3 dz. 


If time t is introduced as the parameter, we know that the vector velocity of 
the moving particle is 


v = 


ds 

dt 
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Thus 


T ds = v dt, F • T ds — F • v dt, 
and the formula for work becomes 

W = [ 1 F • v dt, (15.13-6) 

Jt 0 

where t 0 and t y are the initial and final values of t. This formula is convenient if 
the path of the particle is defined by giving x, y, z as functions of t. It should be 
noted, however, that the work does not depend on the particular law of motion, 
but only on the force and the path which is followed by the particle. 

Example 2. A particle moves in the xy -plane according to the law 

* = 64V3 1, y = 64t~ 16f 2 , 

and is acted on by a force F which is directly proportional to the velocity in 
magnitude, but opposite in sense to the velocity. Find the work done by F from 
t = 0 to t = 4. 

The components of velocity are 

^ = 64V3, ^ = 64-32 1. 

at at 

Hence the components of F are 

F\ = -64V3c, Fj = —(64 — 32f)c, 

where c is a positive constant of proportionality. Hence 

F • v = - c {(64 V3) 2 + (64 - 32t) 2 }, 

and the work is 

W = f F-vdf = -1024c f(16-4 f + t 2 ) dt, 

Jo Jo 

W = (163,840). 


EXERCISES 

1. Find the value of the line integral in Example 1 if F, instead of having constant 
length, always reaches just to the origin. 

2. A point moves from (0, 0) to (2a, 0) along the semicircle in Fig. 141. It is acted on 
by a force of constant magnitude 2, whose direction makes constant angles of 45° with 
both the positive co-ordinate axes. Find the work done by F. 

3. A particle of weight w descends from (0, 2) to (4, 0) along the parabola 8y = 
(x - 4) 2 . It is acted on by gravity and also by a horizontal force of magnitude equal to the 
y-coordinate of the point, acting in the positive x-direction. Find the total work done by 
these two forces. 
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4 . The cycloid x = a(0 - sin 0), y = a(l - cos 0) is generated by a point fixed in the 
circumference of a rolling circle. Let the point be acted on by a force of unit magnitude 
directed toward the center of the rolling circle, (a) Find the work done by the force as 
the particle moves from 0 = 0 to 0 = tt. (b) How much of the work in (a) is done by the 
vertical component of the force? 

5. A weight is dragged along the x-axis from x = 0 to x = 7 (units in feet) by a string 
which passes over a pulley located at (16, 12) in the xy-plane. If the tension in the string is 
constantly 10 pounds, find the work done by the pulling force. 


15.2 / PARTIAL DERIVATIVES AT 
THE BOUNDARY OF A REGION 


In this section we take up a few matters concerning the meaning and behavior of 
partial derivatives at the boundary of a region. The discussion is relevant to an 
exact understanding of some later parts of this chapter. The student may, if he 
wishes, go directly on to §15.3, and read this section later. 

Let us recall the definition of a partial derivative. For the partial derivative 
of /(x, y) with respect to x at (a, b) we define 


/i(a, b) = lim 

x-*a 


f(x 9 b)-f(a,b) 
x - a 


Ordinarily we require that the limit be the same when x -» a from the right as 
when x-»a from the left. This presupposes that /(x, y) is defined along the line 
y = b for some distance on either side of x = a. If, however, /(x, y) is not defined 
when y = b and x < a, we require only that the limit exist as x -» a from the 
right. The restriction to this kind of one-sided limit is typically necessary in 
considering the partial derivatives of / at a point on the boundary of the region 
in which the function is defined. 


Example 1. Let R be the region shown in Fig. 
and let /(x, y) = (1 - x 2 - y 2 ) 3/2 . This function is not 
defined if * 2 +y 2 >l. If (a, b) is a point on the 
boundary of R in the second quadrant, /i(a, b) must 
be understood as a limit in which x -» a from the right, 
and / 2 (a, b) must be understood as a limit in which 
y -> b from below. At most other boundary points the 
situation is similar. At (0, 1), however, there is an even 
more unusual situation. Along the line y = 1, there is 
only the single point x = 0 which belongs to R. 
Therefore /i(0, 1) is not defined, because the 
quotient 

/(x, 1) — / (0, 1) 
x - 0 


142, defined by x 2 +y 2 ^l, 
y 



is not defined; we cannot let x->0 either from the right or the left ! Similar 
considerations show that /i(0, -1) is undefined, and that / 2 (±1,0) are undefined. 
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df 

Now let us look at the formula for when it is defined. The usual rules give 

dx 


Let us define 


~3x(1 -x 2 - y 2 ) m . 


g(x , y) = -3x(l - X 2 - y 2 ) l/2 


at each point of R. Then f { (x f y) = g(x, y) at all points where f\(x , y) is defined. 
But we observe that g(x, y) is even defined at the points (0, ±1), where f\(x , y) is 
not defined. Moreover, g is continuous in R , and g(0, ±1) = 0. Therefore, even 
though /i(0, 1) is not defined, it is true that 


lim /,(x, y) = 0. 

(X, y)-»(0, 1) 


In situations like this example, it is customary to consider /i(0, 1) as being 
defined by the limiting value which / t (x, y) approaches as (x, y)-»(0, 1). This is a 
conventional agreement which proves to be useful in practice. In the present 
case, this convention permits us to say that df/dx is continuous throughout the 
entire closed region R. 

A general statement of this convention may be made as follows: If R is a 

df 

closed region, and / is defined in R, we agree to say that is continuous in R 

dx 

if there is some function g which is defined and continuous in R 


df 

and such that — = g at all interior points of R. Similar conventions are made 

oX 


df 

pertaining to and to partial derivatives of higher order. Also, similar con- 
dy 

ventions are made for functions /(x, y, z) defined in a closed region of three- 
dimensional space. 

The foregoing convention is useful when it comes to considering normal deriv- 
atives at the boundary of a region. Such considerations are quite important in the 
physical applications of line and surface integrals (in potential theory, for example). 

Suppose a certain part of the boundary of R consists of a smooth arc. Let P 0 
be a point of this arc, and let n be a vector of unit length 
perpendicular to the boundary at P 0 , and pointing 
outward from R. We call n the outer normal at P 0 (see 
Fig. 143). Let a be the angle counterclockwise from the 
positive x -direction to the direction of n. If / is a 
function defined in R, with first partial derivatives 
which are continuous in R , the outer normal derivative 
of / at Po is defined to be 

cos a + sin a, 1(15.2-1) 
dn 5x ay 

where and are evaluated at P 0 . 

ax ay 



Fig. 143. 
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Observe that is not, strictly speaking, the rate of change of / at P 0 in the 
on 

direction of n, because / is not defined outside of JR, and therefore such a rate of 

df 

change is meaningless. It is true, however, that - is the rate of change in the 

on 

inward direction at P 0 (i.e., in the direction opposite to that of n). 

15.3 / GREEN’S THEOREM IN THE PLANE 

The subject of the present section is a very important theorem relating to line 
integrals around closed curves in the plane. More precisely, the theorem exhibits 
an exact relation between a line integral taken around the curve (or curves) 
forming the boundary of a region and a certain double integral taken over the 
region. With the aid of the theorem we can transform certain double integrals 
into line integrals, and vice versa. Such transformations are of the highest 
usefulness in many mathematical arguments, as we shall subsequently see in 
some instances. Many other instances abound in the literature dealing with the 
partial differential equations of applied mathematics. A widespread usage sanc- 
tions the attachment to the theorem of the name of G. Green, an English 
mathematician of the early 19th century. There is also justification for calling it 
Gauss’s theorem, after a great German mathematician of the same period. 

THEOREM I. Let R be a closed and bounded region of the xy -plane. Let the 
boundary of R consist of a finite number of simple closed curves which do not 
intersect each other , and each of which is sectionally smooth. Let P(x , y) and 
Q(x , y) be functions which are continuous and have continuous first partial 
derivatives in JR. Let C denote the aggregate of curves forming the boundary of 
jR, each oriented in such a way that the region is on the left as one advances 
along the curve in the positive direction. Then 

j c Pd, + Qdy.jj(!§-^)4xd,. <>«-* 

R 

Definition. A region having the properties specified in Theorem I will be called a 
regular region. 

There are certain difficulties in proving Green’s theorem in the full generality 
of its statement. However, for regions of sufficiently simple shape the proof is 
quite easy. We shall begin by giving the proof for such easy cases, and then 
extending it somewhat. Then we shall proceed to illustrate the theorem in some 
special cases, and give some applications. We do not give a fully detailed proof 
of the theorem, but we give indications of such a proof. For further comments 
on the proof see §15.31. 

Before going further, we observe that the functions P and Q are in- 
dependent of one another, and hence formula (15.3-1) may be broken down into 
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two separate formulas, namely 

I P dx — — j j ^dxdy, 1 ( 15 . 3 - 2 ) 

R 

fc Qdy = ff^ dxdy - *< 15 , 3 - 3 ) 

R 

The proof of (15.3-1) is equivalent to proofs of (15.3-2) and (15.3-3) separately. 

It should be mentioned that we assume the x and y axes have their usual 
relation to each other, i.e., that a counterclockwise rotation of 90° is needed to 
carry the positive x-axis into the position occupied by the positive y-axis. This 
arrangement is responsible for the minus sign in (15.3-2) and the lack of it in 
(15.3-3). 

Let us assume that R has the simple form suggested by Fig. 144. That is, 
suppose that there is an interval c^kx^d such that for 
an x' outside the interval the line x = x r does not 
intersect R , while for c <x f <d the line x = x' inter- 
sects R in an interval. The lines x = c, x = d may 
intersect R either in an interval or a single point. The 
boundary of R then consists of a lower curve y = Yt(x), 
an upper curve y = Y 2 (x), and certain portions (either a 
segment or a point) of each of the lines x = c, d. The 
positive orientation of the boundary is shown in Fig. 

144. Let C] and C 2 denote the oriented lower and upper 
curves, respectively. 

Definition . If a regular region R has the simple form just described , let us call it an 
x-simple region. Likewise we may define a y-simple region. 

Now consider formula (15.3-2) for an x-simple region R. There are no 
contributions to the line integral from the portions of C on the lines x = c, x = d, 
since dx = 0 along a segment of either line. Hence 

f P dx = f Pdx+ f P dx. 

J c J c, J c 2 

But on Ci we may take x as the parameter, and put y = Yi(x). Hence 
£ P(x, y)dx = J P(x, Y|(x)) dx. 

Likewise, 



J P(x,y)dx = f P(x, Y 2 (x)) dx = -f P(x, Y 2 (x)) dx. 
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(Bear in mind the co-ordinates of the initial and terminal points in determining 
the limits of integration.) Thus 

j c Pdx = -f d {P(x, Y 2 ) - P(x, y,)} dx. 5(15.3-4) 

Next consider the double integral, and use the iterated integral formula 
(13.3-6): 


ii § dxdy= f dx i: § dy - r(i5j - 5) 

R 

The y integration may now be performed with x held constant. The result, by 
Theorem VIII, §1.53, is 

f ^ dy = P(x, Y 2 )- P(x, Y,). *(15.3-6) 

Jy ] dy 

On combining (15.3-6) with (15.3-5) and comparing with (15.3-4), we see the 
truth of (15.3-2) for an x-simple region R. 

An entirely similar proof may be given for formula (15.3-3) if we assume 
that R is y-simple. The figure for this case would resemble Fig. 96 (§13.3). 
Finally, if R is both x-simple and y-simple, we combine (15.3-2) and (15.3-3) to 
give (15.3-1). Green’s theorem is thus easily proved for regions which are both 
x -simple and y-simple. In particular, a bounded region R is both x -simple and 
y-simple if its boundary consists of a single sectionally smooth convex curve. A 
rectangle is such a region. 

There are x-simple regions which are not y-simple, and regions which are 
neither x-simple nor y-simple. On the other hand, many regions may be divided 
into a finite number of subregions, each of which is both x-simple and y-simple. 
For such a region it is easy to prove Green’s theorem. For instance, suppose R is 
the region bounded between the circle and the large triangle in Fig. 145, with 
axes as shown. This region is neither x-simple nor y-simple, but we can divide it 
into four subregions, each of which is both x-simple and y-simple. The formula 
of Green’s theorem therefore holds for each of the subregions. If we add 
corresponding parts of the four formulas, the double integrals combine to give 
the correct double integral over the whole of R. Now 
consider the line integrals. In dividing R into parts, 1 
we introduced four interior connecting lines. Each of 
these lines occurs twice, but with opposite orienta- 
tions in the two occurrences, since each line belongs 
to the boundary of two neighbouring subregions. 

Hence, when all the line integrals are added, the 
contributions from these interior lines cancel out in 
pairs, leaving only the line integral around the ~q 
total oriented boundary of R, that is, counter- 
clockwise around the triangle and clockwise Fig. 145. 
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around the circle. In this way we obtain the proof of Green’s theorem for this 
region. The idea of the proof may clearly be extended to any region which may 
be decomposed into a finite number of subregions which are both x-simple and 
y-simple. There are regions of the sort mentioned in Theorem I which cannot be 
thus decomposed; Green’s theorem for such regions is discussed in §15.31. 


Example 1. Show that the area of a regular region R is given by any one of the 
formulas 


A = — 


L 


ydx, 




- y dx+ x dy , 


15.3-7) 


where C is the positively oriented boundary of R. 

We verify the third formula only, leaving the others to the student. Putting 
P = -y, Q = x in Green’s theorem, we get 


\ j c - y dx + x 



(1+1) dx dy = 


A. 


Example 2 . Calculate f c (x 2 -y 2 )dx-\-2xydy, where C is the counterclock- 
wise boundary of the square formed by the lines x = 0, x = 2, y = 0, y = 2. 

Here we let R be the square region, and put P = x 2 — y 2 , Q = 2xy in Green’s 
theorem. The line integral is equal to 

JJ { 2y- ( — 2y)} dx dy = 4 JJ ydxdy. 

R R 

We could easily calculate the value of this double integral, but it is also possible 
to recognize its value by the following argument: If A is the area of R, and if 
(x, y) is the centroid of R, then, as we know, 


ii 


y dx dy - Ay. 


In this case A = 4 and y = 1. Hence 

J (x 2 - y 2 ) dx + 2xy dy = 16. 

Example 3, Green’s theorem can be used to prove the following theorem of 
Pappus: If R is a regular region lying entirely on one side of the x-axis , and if R is 
revolved about the x-axis, the volume of the solid so generated is equal to 2irAy, 
where A and y have the same significance as in Example 2. 

For brevity, let us consider merely the case of an x-simple region, as 
pictured in Fig. 144. The reasoning can be extended to the most general regular 
region. The volume in question is, by elementary calculus, 

V = n j {Y 2 OO } 2 dx- IT J {Ti(x)} 2 dx, 
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in the notation of Fig. 144. This is readily seen to be the same as the line integral 

V = - 7T I y 2 dx, 

with C oriented counterclockwise. We now apply Green’s theorem, with P = 
~ny 2 , 0 = 0. Thus 

V = [ f 2vy dx dy = 27rAy. 


This is what we wanted to prove. 

Example 4. Use Green’s theorem to deduce the integral formula 




where s refers to arc length along C and n refers to the outer normal to C. 

Hence it is assumed that R is a regular region and that u and its first and 
second partial derivatives are continuous in R. 

To derive (15.3-8) we start by applying Green’s theorem with 


Then we obtain 


n = — p = _*“ 
v dx ’ ay* 



L 


du , . dU 

~^ dx + K 


dy. 


To complete the derivation we have only to show that on C 


du dx ( du dy 
dy ds dx ds 


du 

dn 


We know by (15.2-1) that 


du 

dn 


dU . dU . 

= — cos a + — Sin a. 
dx dy 


We also know that 

^ = cos (j), ^ = $15.3-10) 

ds ds 

where </> is the angle which the positive tangent to 
C makes with the positive x-axis. It is easy to show 
by sketching a figure that as one traverses a curve in 
such a way that the interior is always on the left, the 
outward normal lags behind the tangent by tt/2 rad- 
ians. It follows that a = (j> - (7t/2) (see Fig. 146) and 


y 



Fig. 146. 
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therefore 


cos a 


= dy 
ds’ 


dx 

sin a = - -j-. 

ds 


(15.3-11) 


The correctness of (15.3-9) is now apparent, and the proof of (15.3-8) is 
complete. 


EXERCISES 

1. Use Green’s theorem to evaluate the following line integrals: 

(a) / c 2xy dx - 3xy dy, clockwise around the square bounded by x = 3, x = 5, y = 1, y = 3. 

(b) f c xy 2 dx + 2x 2 y dy, counterclockwise around the ellipse 4x 2 + 9y 2 = 36. 

(c) Jc (x 2 + 2y 2 ) dy , counterclockwise around the circle (x - 2) z + y 2 = 1. 

(d) f c e x sin y dx + e x cos y dy, around the boundary of any regular region. 

(e) f c x 2 ydx ~ y 2 x dy, counterclockwise around the region bounded by y = Va 2 -x 2 and 
y = 0. Use polar co-ordinates to evaluate the double integral. 

(f) f ~ ' rl around the boundary of any regular region not containing the origin. 
Jc x + y 

2. Calculate the line integrals of Exercise 2, §15.12, parts (a), (d), (f), (g) and (h), 
using Green’s theorem. 

3. Let C be any sectionally smooth simple closed curve in the xy-plane, oriented 
counterclockwise. Let R be the region bounded by C, and let R have area A and centroid 
(x, y). Show that 

\ Jc x 2 dy = Ax, Jc xy dy = Ay, 
and, if R is a lamina of constant unit density, interpret 

J ~x 2 y dx and J -x 2 ydx + xy 2 dy 

as moments of inertia, specifying the axis of rotation in each case. 

4. Let R be the region bounded by the rays 6 = a, 6 = 1 3 and the curve r = f(6), as 
shown in Fig. 147. Use the third formula in (15.3-7) to show that the area of R is 

■0 


A = i[ {f(0)} 2 dS, 

J ot 


y 



Fig. 147. 
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by actually calculating the line integral, using r as parameter on the rays, and 6 as 
parameter on the curved side of R. This gives a new derivation of a formula which is 
familiar from elementary calculus. 

5. Derive the formulas 


f f f * .dudvdudv 1 . . [ dv , 

J J l dx dx dy dyj J c dn 

R 

II (u Av - v Au) dx dy = £ (u v J) ds. 


where the notation 


Au = 


dx 2 dy J 


is employed. Model your work somewhat after that of Example 4. 

6. If v satisfies the condition Au = 0 (Laplace’s equation) in R , show that 


[ f d S = 0 

Jc dn 

and 

{{[(S) 2 + (S) 2 ] dxdy= L v S ds - 

R 

7. Green’s theorem can be expressed in vector notation in two different ways. 
Consider the plane region R of the xy-plane as being part of three-dimensional space. If 
F i(x, y) and F 2 (x, y) are functions having continuous derivatives in R, let 


F = Fi(x, y)i + F 2 (x, y)j 

be a vector field defined in space. It has the special property that the z-component is zero 
and that the x- and y-components are independent of z. Show that 


(a) J J (V • F) dx dy = J F • n ds 


and 


(b) || (V x F) ■ k dx dy = J F • T ds, 


where V ■ F and VxF are as defined in §§10.7 and 10.8, and n and T are the unit outer 
normal and the unit positive tangent, respectively, on C. Formulas (15.3-10) and (15.3-11) 
will be found useful. 


15.31 / COMMENTS ON THE PROOF OF GREEN’S THEOREM 

In the statement of Green’s theorem we required only that the region R be 
regular, whereas in the proof given in §15.3 we assumed that R could be 
decomposed into a finite number of subregions, each both x-simple and y- 
simple. Now, not every regular region can be so divided, and so the proof of 
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Theorem I, §15.3, is not complete. An example of such a region is suggested by 
Fig. 148; here three sides of the region are formed by straight lines, but the 
fourth side (the top) is a smooth curve which oscillates more and more rapidly as 
it approaches the origin, and crosses the x-axis an infinite number of times in the 
interval shown. The equation y = x I * 3 sin (1/x) defines such a curve. Such a region 
may be x-simple, but is not y-simple, and it cannot be divided into a finite 
number of y-simple subregions. It is complications of this sort at the boundary 
of R that make it somewhat difficult to give a complete proof of Theorem I. 

One method for completing the proof may be described in outline as follows: 
Let jR t , jR 2 , ...» R n , -• - be a sequence of regions, with oriented boundaries 
Ci, C 2 , . . . , having the following properties: 

1. Each R n is a regular region lying in R. 

2. Each R n can be divided into a finite number of subregions, each of which is both 
x-simple and y-simple. 

3. As n -» oo, R n approaches R , and C„ approaches C in such a way that 

!/(§-!)— //(§-£)** 

Rn R 

and 

I Pdx + Qdy->[ Pdx + Qdy. 

JCn JC 

Green’s theorem holds for each R n , by (1) and (2) and what has already been 
proved in §15.3. It therefore holds for R , by (3). The crux of the problem now is 
the construction of the approximating regions R n so that conditions (1), (2), (3) 
are satisfied. The details of proving such a construction possible are too intricate for 
consideration in this book. 

There are also other approaches to the problem of a complete proof. One 
important fact is that any regular region may be divided into a finite number of 
subregions each of which is either x-simple or y-simple. 

We conclude this section with the remark that, when a more advanced point 
of view is taken, and the integral concepts are suitably generalized. Green’s 
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theorem, that is, formula (15.3-1), remains true under hypotheses much less 
restrictive than those stated in Theorem I of §15.3. The relaxation of restrictions 
applies not only to the character of the region R , but to the continuity require- 
ments on the functions P, Q and their derivatives. 


15.32 /TRANSFORMATION OF DOUBLE INTEGRALS 

In this section we consider the effect on the double integral 


JJ F(x,y)dxdy (15.32-1) 

R 

when we make a change of variable by equations 

x=/(n,t>), y = g(n,»). (15.32-2) 

As one important result of our work, we shall discover the significance of the 
sign of the Jacobian of the mapping (15.32-2) from the uu-plane to the xy-plane. 

The discussion is based on the assumption that the student is familiar with 
Chapter 9, and with §9.2 in particular. We assume that equations (15.32-2) 
establish a one-to-one mapping of a part of the MD-plane onto a part of the 
xy-plane. Let the inverse mapping be defined by the equations 

u = h(x, y), v = k(x, y). (15.32-3) 


We suppose the functions /, g to be continuous, together with their first and 
second partial derivatives; we further assume that the Jacobian 


J(u, v) = 


W,g) 

a(u, v) 


is always of the same sign — either always positive or always negative, and 
hence never zero. Let R be a regular region all of whose points are interior to the 
region of the xy-plane which is being mapped, and let R ' be the corresponding 
region in the wu-plane. Then R' is also a regular region, as one may show with 
the aid of equations (15.32-3). 


THEOREM II. Let A be the area of R. Then , under the foregoing assumptions , 


A = JJ | J(u, t>)| du dv. 

R' 


(15.32-4) 


Proof. We start off with the fact (see Example 1 of §15.3) that 


A = 



(15.32-5) 


where C is the complete boundary of P, oriented in the usual positive sense, i.e., 
so that the region R is on the left as one advances along the curve in the positive 
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direction. Let C' denote the boundary of R'. We orient C' by taking the positive 
sense along C' to be that which corresponds, under the mapping, to the positive 
sense along C; thus, as (x, y) moves along C in the positive sense, its image 
point (u, t>) moves along C' in the positive sense. With this agreement we have 

(,5 ' 32 - 6) 

since 

x~f{u,v) and dy = ^-du+^-dv 

oU o V 


hold for corresponding points of C and C' . 

Next we apply Green’s theorem in the uu-plane to the line integral in 
(15.32-6). Instead of P dx + Qdy we have 



du+ fTv 


dv. 


Therefore, corresponding to 

dQ-W we have 

dX dy du V dv) dv V du) 

On carrying out the indicated differentiations, this latter expression is found to 
be precisely J(u, u). Consequently, 


/c If IS = - // J, “’ ^ d ”- 

R' 


(15.32-7) 


The choice of sign on the right is determined by the orientation of C\ If the 
orientation which we have given to C' coincides with the usual positive orien- 
tation of the boundary of R the plus sign is correct; in the contrary case we 
must choose the minus sign. Combining (15.32-5), (15.32-6), and (15.32-7), we 
see that 


A = 



du dv. 


Since A is positive and J is always of the same sign, it follows that the sign 
chosen in (15.32-7) must be the same as the sign of J. Whichever the sign, formula 
(15.32-4) is correct. 

The last remarks enable us to justify the answer given to the question posed 
at the end of §9.2. Suppose R is a circular region. The positive orientation of its 
circumference C is counterclockwise. The image of C will be a simple closed 
curve C', and R' will consist of the interior of C' and C' itself. Hence, the usual 
positive orientation of C’ will also be counterclockwise. But the mapping of R 
onto R f induces a certain orientation of C'. From the discussion in the foregoing 
paragraph we see that the induced orientation of C' is counterclockwise if and 
only if the Jacobian of the mapping is positive. 
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If the regions R , R' are connected (if one is connected, so is the other) we 
can apply the law of the mean to (15.32-4), and obtain the following result: Let 
A ' be the area of R r . There is some point (u, v) in R f such that 

A = \J(u, u)|A\ (15.32-8) 

The magnitude of the Jacobian is therefore a measure of the distortion of areas 
by the mapping. If | J | = 1 we say that the mapping is equiareai 
Let us now turn to the double integral (15.32-1). 

THEOREM III. If F is continuous in R and if the mapping meets the conditions 
stated prior to Theorem II, we have 

jj F(x, y)dxdy = jj F[f(u, v), g(u, v)jjJ(u, v)\ du dv. (15.32-9) 

R R' 

Proof. Let R be divided into a finite number of connected regular subregions 
R\, . . . , R n of areas A A\, . . . , A A„. Let the corresponding regions in the ni;-plane 
be R\, . . . , R’ n , of areas AAi, . . . , AA„. Choose a point (u fc , Vk) in Ri such that (by 
(15.32-8)) 

AAfc = |J ( Uk , Vk ) | AAk, 

and let (x*, y k ) be the corresponding point of R k . For convenience write 

G(u, v) = F[f(u, v), g(u, v)]. 

Then 

2 F(x k , y k ) A At = 2 G(u k , v k )\J(u k , t,„)| A A' k . 

k = 1 k = 1 

We now pass to the limit as and the maximum dimensions of the 

subregions approach zero. By Theorem II, §13.23, we obtain formula (15.32-9) as 
the result. 

If we like we may interpret the variables u , v as curvilinear co-ordinates in 
the xy-plane; this point of view was discussed in §9.5. If we consider a small cell 
in the mesh of co-ordinate curves, we see from (15.32-8) that the area of the cell 
with opposite vertices (u, v), (u + Au, v + Av) is approximately \J(u, v)\ Au Av. 
By expressing the nu-integral in (15.32-9) as an iterated integral, we obtain a 
formula for the evaluation of the double integral (15.32-1) by an iterated integral 
in curvilinear co-ordinates. For polar co-ordinates such a result is already known 
to us. If 

x-r cos 6, y = r sin 6 

we readily compute 

Hx, y) _ 
a(r, 0) 

Thus, for the transformation to polar co-ordinates, the Jacobian is the familiar 
factor r. 
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EXERCISES 

1. The transformation x = u 2 - v 2 , y = 2uv maps the rectangle 1 ^ r s 3 

of the uu-plane into a certain region R of the xy-plane. Make a sketch of R and find its 
area. 

2. Consider the mapping x = au + bv, y = - bu + av, (a, b constant). If a certain 
region of unit area in the xy-plane corresponds to a region R' in the nu-pl&ne, find the 
area of R'. 

3. The equations u = x 2 y - y 3 , v = 2xy 2 map points near (2, 1) in the xy-plane into 
points near (3,4) in the nu-plane. If R and R' are corresponding small regions containing 
these respective points, find the approximate ratio of the area of R to that of R\ 

4. Calculate JJxdxdy and the area of R, and so find x for R , where R is the region 

R 

bounded by the lines x = 3y, 2x -I- y = 0, x - 3y = 14, 2x + y = 21. Use the transformation 
x = u + 3u, y = -2 m -I- v. 

5. Use the transformation x = au, y = bv to map the region R defined by (x 2 /a 2 ) + 
(y 2 /b 2 )^ 1 onto the nu-plane. Evaluate the integral 


il( x p + p) dxdy 


with the aid of this transformation and polar co-ordinates in the ur-plane. 


6. Calculate 


ii 


dx dy 
x + y 


, where R is the region in the xy-plane bounded by the lines 


x + y = 1, x + y = 4, y = 0, x = 0. Use the transformation x = u - uv, y = uv. 

7. Change the iterated integral [ dx f e (y ~ x)Ky+x) dy to a double integral and 

Jo Jo 

evaluate it with the aid of the transformation n = x + y, v = x- y. 

8. Use the transformation x = Vu - n, y = n + u to evaluate the integral 

i>t. 


4-x 2 

2 x"+ y 


x dy 

— 


9. Let R be the first-quadrant region bounded by the curves x 2 - y 2 = 1, x 2 - y 2 = 4, 
x 2 + y 2 = 9, x 2 + y 2 = 16. Calculate Jf xydx dy with the aid of the transformation u = 

x 2 -y 2 , v = x 2 + y 2 . 

10. Find the area of the first quadrant region bounded by xy = 4, xy = 16, y = x + 3, 
y = x - 3. Use the transformation 2m = x - y, v = Vxy . 

11. Consider the mapping 

2x 2y 


?+ 7 ’ 


x 2 + y 2 ' 


Draw the circles m = 3, n = 3, v = i v = 1 in the xy-plane, and let R be the region bounded 
by them. Calculate the value of J j 

R 

12. Find the region in the rd -plane corresponding to the region R in the xy-plane 
inside the circle x 2 + y 2 = 2x and to the right of x = 1, assuming x-r cos 8, y = r sin 0, 
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r > 0, 


- — < a < 


. Transform 


// 


x dx dy 

PT7 


to an integral in the rfl-plane, and evaluate it. 


13. Consider the families of curves x 3 = uy , y 3 — vx in the xy-plane. Show that the 
first-quadrant area bounded by the curves u = a 2 , u = b 2 , v = a 2 , v = (i 2 is i(a - b)( a - jS), 
if a > b > 0, a > (3 > 0. 


14. Use the transformation x = u + uv, y = v + uv to calculate the integral 


// 


dx dy 

[(x-y) 2 + 2(x + y)+ 1] 


" 175 ) where R is the triangle with vertices at (0, 0), (2, 0), (2, 2). 


15.4 / EXACT DIFFERENTIALS 

An expression such as 

M(x, y) dx + N(jc, y) dy (15.4-1) 

is called a first-order differential form in two variables. As examples we 
list 

(y sin x - 1) dx + cos x dy , 

(x 2 + y 2 ) dx — 2xy dy. 

The functions M and N occurring in the form are assumed to be defined in some 
region of the plane. In practice certain continuity and differentiability restric- 
tions will be imposed on the functions. 

The purpose of this section is to study those particular kinds of differential 
forms which are the differentials of functions of two variables. The following 
forms are of this kind: 


xdy-\-ydx = d(x y), 

2 p- 2 dx + 2 V" 2 dy = d tan -1 f-V 

x 2 +y 2 x 2 +y 2 \x/ 


^ 2 * y a dx + x 2^ y 2 dy = d log(x 2 + y 2 ) 1 ' 2 . 

A differential form is called an exact differential if it is the differential of some 
function u at all points of some region in the xy-plane. Since 


, dU , dW , 
du = K dx+ ^ dy ' 


this means that Mdx + Ndy is exact if there is some differentiable function u 
such that 




(15.4-2) 


at each point of some region. 

There are many nonexact differential forms, as we shall see later. 
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In order to have enough precision for the statement of our theorems, we 
make the following formal definition: 

Definition . The differential form Mdx + N dy is said to be exact at the point (a, b ) if 
there is some single-valued differentiable function u defined in some neighborhood 
of (a, b ), such that du = Mdx + N dy at all points of the neighborhood. 

Observe that, in defining exactness of a form at a point, we actually place a 
condition on the form at all points of some neighborhood of the point, not 
merely at the point itself. 

The fundamental problem relating to exact differentials may be stated in two 
parts: 

1. How can one tell by examining M and N whether or not the differential form is 
exact at a particular point? 

2. If the differential form is known to be exact, how can one actually find a function 
of which the form is the differential? 

Both of these questions can be answered with the aid of line integrals, 
provided we assume that M and N have continuous first partial derivatives. 


THEOREM IV. If M and N have continuous first partial derivatives at all points 
of some open rectangle , the differential form is exact at each point of the 
rectangle if and only if the condition 


(9 AT _ dN 
dy dx 


(15.4-3) 


is satisfied throughout the rectangle. When this condition is satisfied , a function 
u such that du = Mdx + N dy is furnished by the line integral 



M(s, t) ds +N(s , t) dt 


(15.4-4) 


along the path from (a, b) to (x, y) shown in Fig. 149, where (a, b) is the center of 
the rectangle. 



Fig. 149 . 
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Remark . s and t are used in place of x and y as variables in the line integral , since 
(x, y) is a fixed point during the computation of the line integral 


Proof of the Theorem. The necessity of condition (15.4-3) is proved as 
follows: If the form is exact, there is some function u such that equations 
(15.4-2) hold. Then 

dM _ d 2 u dN __ d 2 u 

dy~dydx and ~dx~dxdy‘ 

The second derivatives of u are continuous, since the first derivatives of M and 
N are continuous, by hypothesis. Therefore 

d 2 u _ d 2 u 
dy dx dxdy' 

and so the condition (15.4-3) is satisfied. 

The sufficiency part of the proof consists in showing that if (15.4-3) holds 

and u(x, y) is defined by (15.4-4), then ^ = M and — N. For this purpose we 

dx dy 

express the line integral in terms of ordinary integrals. Along the horizontal part 
of C, t = b and s varies from a to x. Hence dt = 0, and the line integral over this 
part of C is equal to 

[ M(s,b)ds. 


On the vertical part of C, s = x and t varies from b to y. Since s is constant 
during the integration, ds = 0, and the line integral over this part of C is equal to 


Thus 


f’ N(x, t ) dt. 

u(x,y) = J M(s,b)ds + J N(x,t)dt. 


(15.4-5) 


We may now think of x and y as variables. Doing this, we see that the first 
integral does not depend on y, while the second one involves y merely as a limit 
of integration. Therefore, by Theorem VII, §1.52, 


ff = N(x, y) . 

All that remains to complete the proof is to show that M. To show this we 
shall prove that 

u(x, y)= [ M(s,y)ds+f N(a,t)dt. (15.4-6) 

J a J b 


The required result will then follow when we differentiate with respect to x. The 
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key to this part of the proof is the consideration of the 
line integral over the alternative path C 2 shown in Fig. 
150. In this diagram the former path C is denoted by 
Ci. The line integral over C 2 is precisely the expres- 
sion on the right in (15.4-6). If R is the rectangle 
enclosed by C x and C 2 , the counterclockwise boun- 
dary of R consists of C x and the reversal of C 2 . 
Therefore, by Green’s theorem, 



(*,y) 


f Mds + Ndt- 1 Mds+Nif jj 

R 

But condition (15.4-3) in our present notation means that -r— = . Therefore, 

ds dt 

the line integrals over Ci and C 2 are equal, and their common value is u(x, y). 
Actual calculation of the integral over C 2 , using s = a, ds = 0 on the first part, 
and t = y, dt = 0 on the second part, gives us formula (15.4-6). The proof of 
Theorem IV is then complete. 

The fact is that the line integral defining u can be taken over any path from 
(a, b) to (x, y), so long as the path remains in the originally specified rectangle. 
This sort of thing is discussed further in §15.41. 

Example 1. Show that 

(y -x 2 ) dx + (x + y 2 ) dy 

is an exact differential in the whole xy -plane, and find a function of which it is 
the differential. 

Since 


i (y - x 2 )= ic (x+y2) ’ 

we know that we have an exact differential. If (a, b ) is an arbitrary point, 
formula (15.4-5) becomes 

u(x, y) = j (b - s 2 ) ds + J (x + t 2 ) dt, 

or, after a simple calculation, 

x 3 y 3 a 3 b 3 

u = - J+xy + J + --ab- T - 

The point (a, b) is arbitrary, so the last three terms may be lumped into a single 
arbitrary constant C: 

x 3 y 3 

u = — — + xy+^-+C. 
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In practice the function u is often found by a formal procedure based on 
(15.4-5), but not bringing in the point (a, b ) explicitly. Let us write 

u = </>(*) + F(x, y), 

where <jy(x) and F(x, y) represent the first and second terms, respectively, in 
dF 

(15.4-5). Since —= N(x, y), we may think of F as an indefinite integral of N 
oy 

with respect to y, x being held constant. The condition —■ = M means that 


Once F has been found we solve for <t>(x) by integrating this last equation. The 
variable y is not involved in the equation, for 

_d_(dF_ = d 2 F dM _ dN dM 

dy \ dx ) dx dy dy dx dy 

Example 2 . We illustrate the method on the previous example. 

N = x + y 2 , F(x, y) = xy + \y 3 ; 
u = </>(x) + xy + iy\ 

<f>'(x) + y = M = y - x 2 , or <f>'(x) = -x 2 . 

Thus 


<f)(x) - - \x 3 + C, 

u = -|x 3 + xy + ]y 3 + C. 

Once we have found a function u such that du = M dx + N dy , it is easy to 
evaluate the line integral of the differential form, as the next theorem shows. 


THEOREM V. Let u = f(x, y) be a single-valued function with the differential 
du = M dx + N dy at all points of some region. Let C be a curve lying in this 
region , and let C have initial point (x 0 , yo) and terminal point (x h y r ). We 
assume M and N continuous , and C sectionally smooth. Then 

J Afdx + Ndy = /(xi, yi)-/(x 0 , y 0 ). (15.4-7) 

Proof. We shall give the proof for a smooth arc. For a sectionally smooth 
curve the proof then follows by adding the results for constituent arcs. In terms 
of a parameter t, going from f 0 to t x as (x, y) goes from (x 0 , yo) to (xi, yO along C, 

fMdx + Ndy = jj (M f t + N § ) dt = (f) dt 

= U =f(x x , yi )-/(x 0 , y 0 ). 

1 



474 


LINE AND SURFACE INTEGRALS 


Ch. 15 


Example 3 . Evaluate the integral 

f x y 

I ~ 5 5 dx H — ^ 5 dy, 

Jcx-y 2 y.-x 2 

where C is a curve from (1, 0) to (5, 3) and lies between the lines y = x, y = — x. 
We observe by inspection that 


x dx 

n 


x- y y-x 


ydy 

2 — = du, 


where u = \ log(x 2 - y 2 ). This function behaves properly as long as x 2 > y 2 . Hence 
the integral in question is equal to 


\ log(25 -9)-| log(l - 0) = log 4. 


15.41 / LINE INTEGRALS INDEPENDENT OF THE PATH 

Theorem V in the foregoing section shows that the line integral along C from 
(x 0 , yo) to (x t , yO has the same value for all curves C starting at (x 0 , yo) and 
ending at (xi, y0, provided that C lies in the region where u =/(x, y) is defined 
and du = M dx + N dy. Note, however, that we have emphasized the requirement 
that /(x, y) be single-valued. The reason 'for insistence on this matter will be 
more apparent after we have considered the following example. 

Example L Consider the line integral 

f -T~idx+-^ r -^dy (15.41-1) 

Jc x + y x z +y z 

along various curves from (1,0) to (-1,0). 

In this case the differential form is exact at each point of the plane except 
(0,0), as the student should verify, using (15.4-3). It follows from Theorem IV 
that in any open rectangle which does not contain the origin there is some 
function defined whose differential is precisely the differential form appearing in 
(15.41-1). As a matter of fact, if r, 6 are polar co-ordinates, it is easy to verify 
that 


de = Zl dx + xdy 
x + y 


(15.41-2) 


We leave it for the student to verify this by calculating dx and dy in terms of dr 
and d6 from the equations x = r cos 0, y = r sin 0. 

If we now attempt to find the value of the line integral (15.41-1) by applying 
Theorem V, we take 6 = /(x, y). The question then arises: In what region of the 
plane may we regard 6 as a single-valued differentiable function of x and y? 
Some standard procedure must be adopted so that a unique value of 6 is 
assigned to each point. One possibility is to require 0 ^ 0 < 2ir. Then 0 is a 
single-valued function defined at each point (x, y) except (0, 0). The function is 
discontinuous (and hence not differentiable) at points on the positive x-axis, 
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however, for the value of 0 experiences a sudden jump as (x, y) crosses the 
positive x-axis. The value of 6 might be standardized in a different way by 
requiring that -it <0 ^ tt; this would make 0 discontinuous along the negative 
x-axis. There are infinitely many other ways of standardizing the definition of 0 
as a single-valued function, but there is no way in which 0 can be defined so as 
to be single-valued and differentiable simultaneously at all points other than 
(0, 0). Whatever method is used, 0 must have at least one point of discontinuity 
on any closed curve which encircles the origin. 

Now consider the line integral (15.41-1). If C goes from (1,0) to (-1,0) 
along a route which does not go through the origin or cross the negative y-axis, 
we may standardize 0 by the requirement -tt/2 < 0 ^ 3W2. The value of the 
integral is then found by Theorem V ; it is 


I (10 = 77. 

Jo 

For a path from (1, 0) to (-1, 0) in the lower half plane, we can define 0 so that the 
discontinuities occur in the upper half plane, say by requiring — 37t/2 < 0 ^ tt/2. 
Then the value of the line integral is 



dO = — 77. 


Of course, the values of the line integrals for these two paths do not depend on 
the particular method which is chosen to standardize 0. 


The foregoing example shows the following: If M dx + M dy is a differential 

form such that -r— = -r— at all points of a region D, it is not necessarily true that 
dy dx 

Mdx + N dy is the differential of a function which is single-valued and differen- 
tiable at all points of D. This raises the question: Is there some kind of a 


condition which is sufficient to guarantee that if — — = in D, then there exists 

dy dx 

a single-valued function u = /(x, y) such that du = Mdx + N dy in D? 

There is such a condition. To explain it we must introduce the concept of a 
simply connected region. 


Definition. A connected open set D is called a simply connected region if it has the 
property that whenever a simple closed curve C lies in D , all points inside C are also 
in D. If D is not simply connected it is called multiply connected. 

The property of being simply connected is a property “in the large.” The 
interior of a circle or rectangle is simply connected. The region between two 
concentric circles is multiply connected. So is the region consisting of the entire 
plane with the exception of one or any finite number of points. 

Using this new concept, we can state the following theorem: 
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THEOREM VI. Let D be a simply connected region. Let M and N have 
continuous first partial derivatives in D, such that 


dM _ dN 
dy dx 


(15.41-3) 


at each point. Then there is a function u = /(x, y), single-valued and 
differentiable in D, such that du = Mdx + N dy. 


Proof. Let (x 0 , yo) and (xi, y j) be any points of D. Consider paths C from (x 0 , y 0 ) 
to (xi, yi), restricted as follows: C consists of a finite number of straight line 
segments joined end to end, each segment being 
parallel to either the x or y axis (see Fig. 15 1). Further, 

C is to lie in D, and must not intersect itself. Such a 
curve C will be called an elementary path. Since D is 
connected, it can be shown that any two points of D 
can be joined by an elementary path in D. We omit the 
details of this. 

Now, the idea of the proof is to show that 

J Mdx + Ndy 

has the same value for any two elementary paths in D 
joining (x 0 , y 0 ) to (x b yi). Then, keeping (x 0 , y 0 ) fixed, 
we define f(x i, yO as the value of the line integral. Fig. 151. 

Since the value depends only on (xi, yi), and not 

on the particular elementary path, we get a single-valued function defined 
throughout D. 

Suppose then that C i and C 2 are two such elementary paths in D from 
(x 0 , yo) to (xi, yi). They may coincide along certain line segments, but the 
noncoincident portions will have just a finite number of intersections, and these 
portions will form the boundaries of a finite number of regions. Each of these 
regions will lie entirely in D because of the assump- 
tion that D is simply connected. The situation is like 
that suggested in Fig. 152, where these latter regions 
are shaded. If R is any one of these regions, the 
line integral around its complete boundary in the 
counterclockwise sense is zero, for the line integral 
is equal to 

//( £-£)**-• 

C 

by Green’s theorem and the hypothesis (15.41-3). 

From this it is readily seen that the integrals over C\ 
and C 2 are equal. Fig. 152. 
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We now drop the subscripts on (x h yi), and define f(x, y) as the line integral 
from (x 0 , yo) to (x, y) over any elementary path in D. This function has partial 
derivatives 

f-M, J£-N. USAl-n 

In fact, if we choose the path so that the last segment of C is horizontal, say 
from (a, y) to (x, y), then 

/(*» y) = f(a, y)+ f Af (s, y) ds, 

J a 

and so the first equation (15.41-4) must hold. The second equation also holds, by 
a similar argument in which the last segment of the path is parallel to the y-axis. 
This completes the proof of Theorem VI. 

Make sure that you see where the proof would break down if D were not 
simply connected. 

The restriction to elementary paths is purely for the purposes of the proof of 
Theorem VI. Once the theorem is proved, we see that the line integral has the 
same value over any path from (x 0 , yo) to (xi, yi). The proof is by Theorem V, 
where in this case /(x o, yo) — 0. 

A non-simply connected region can often be made 
simply connected by introducing certain barriers in the 
form of curves whose points are no longer considered as 
belonging to the region. Such curves are called cuts. For 
example, let D be the whole plane except for the origin. 

This is a non-simply connected region. But if we introduce 
the positive x-axis as a cut, the modified region (call it Di) 
is simply connected. No curve enclosing the origin can lie Fig. 153. 
wholly in D\, and it is the fact that such curves do lie in D 
that makes D non-simply connected. As another example, 
consider the region D shown in Fig. 153, lying inside the 
large curve and outside the two small ones. It is multiply 
connected, but it may be rendered into a simply connected 
region by making two cuts, as shown in Fig. 154. The 
student will readily see that the choice of cuts is not 
unique. The number of cuts needed in a given case is 
always the same, however, no matter how they are 
made. 

EXERCISES 

1. Test each form for exactness, and if it is exact, find a function of which the form 
is the differential. Practice the two methods illustrated in Examples 1 and 2, §15.4. The 
point ( a , b) may be taken as (0, 0). 

(a) x dy T (y - 7) dx. 

(b) (2y 2 - 3x) dx - 4xy dy. 





478 


LINE AND SURFACE INTEGRALS 


Ch. 15 


(c) ( 2x + 5y - 7 )dx + (5x - 8y + 3) dy. 

(d) (2xy + x 2 ) dx + x 2 dy. 

(e) xe xy sin y dx- 1- ( e xy cos y + y ) dy. 

(f) (xy cos jcy + sin xy) dx + x 2 cos xy dy. 

(g) (4x 3 + 10xy 3 - 3y 4 ) dx + (15x 2 y 2 - 12xy 3 + 5y 4 ) dy. 

(h) (e x sin y - y) dx + ( e x cos y - x - 2) dy. 

2. (a) Find a function u such that 

du = dx - 2-- J y dy, 


and describe the region or regions in which u is differentiable, 
(b) Find the value of the line integral 


L 


\ + y 


dx 


y + x y 


- dy 


from (1,0) to (5,2); from (—3,0) to (-1,4). In each case specify any essential limitations 
on the path C. 

3. (a) Find a function u such that 


. _ dy x dx 

yVP-F + y 2 -* 2 ’ 

and describe the region or regions in which u is differentiable. 

(b) Find the line integral of the differential form in (a) from (3, 5) to (5, 13), and specify 
any necessary limitations on the path. 

4. Find a function of x alone, w = <Hx), which makes the differential form 
w(x sin y + y cos y) dx + w(x cos y - y sin y) dy exact; then find the function of which it 
is the differential, if this function is equal to 0 at (0, 0). 

5. Let Pt be (1, 0), P 2 be (— 1, 0), and P be (x, y). Let 0, and 0 2 be the angles between 
the positive x-axis and P\P and P 2 P respectively. Let u = 0i + 0 2 . Show that 


du 




dx + 


where r T =PiP, r 2 - P 2 P. To make u a single-valued function it is necessary to make 
some definite agreement about the values of B\ and 0 2 at all points except P, and P 2 . 

(a) If it is agreed that — tt < 6\ ^ t and 0 ^ 0 2 < 277, show that u is discontinuous if y — 0 
and x 2 > 1. By making cuts along the lines of discontinuity, we get a simply connected 
region in which u is differentiable. 

(b) If it is agreed that 0 ^ 0 ! < hr and 0 ^ 0 2 < 277 , where are the discontinuities of u? 

(c) If v = 0i - 0 2 and the angles are chosen as in (b), where is v discontinuous? 


15.5 / FURTHER DISCUSSION OF SURFACE AREA 

In §14.6 we arrived at the formula 

A = jj VWcT^F 2 du dv 

R 


(15.5-1) 
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for the area of a surface represented parametrically, the parameters (u, u) 
ranging over a region R in the uu-plane. In the earlier discussion of this formula 
we pointed out that it is logically desirable to show that the number A found by 
the formula is really not dependent on the particular parametric representation 
of the surface which is used in calculating A. By using Theorem III, §15.32, it is 
possible to give an analytic proof that A is independent of the parametrization. 

Let another parametrization be given, in terms of parameters (r, t) ranging 
over a region R ' in the rf -plane. Then, there is a one-to-one correspondence 
between the points of R' and those of R , since both of these regions are mapped 
in one-to-one fashion onto the same surface. Thus we may consider u and v as 
functions of r and t. There are now two formulas for ds 2 on the surface: 

ds 2 = E du 2 + 2 Fdu dv + G dv 2 , (15.5-2) 

ds 2 = E’ dr 2 + 2 F' dr dt + G' dt 2 . (15.5-3) 

The prime notation here has nothing to do with differentiation; we use it simply 
to indicate that E', F', G f are related to r and t in the same way that E , F, G are 
related to u and v. 

What we wish to prove is that 


|| \/EG-F 2 dudv= || V E’G' — F n dr dt. (15.5-4) 


Now, regarding u and v as functions of r and t , we have 

, du du 

du= a7 dr+ aT dt ’ 


and a similar formula for dv. We substitute these expressions for du and dv into 
(15.5-2), whereupon we get an expression for ds 2 in terms of dr and dt. If this 
expression is compared with (15.5-3), we see that 


du dv 


+ 1F TrT, + °(fj 


E '- E &) 


du dV n (dv\ 

+ 2F aTaF +G Ur) 


(15.5-5) 


Now let us consider the expression F'G'-F' 2 . For convenience, we write 


du du dv dv 

Ul ~~dr 9 U2 ~~dt’ Vl ~ dr’ V2 ~ dt' 

Then, after carefully writing out the products involved in F'G' and F' 2 from 
(15.5-5), we arrive at the final result 

F'G' - F' 2 = (EG - F 2 )(u it> 2 - w 2 t>i) 2 , 
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or 


VE'G'-F 75 = VEG-F 2 


d(u , v ) 
d(r,t) 


(15.5-6) 


Let us now transform the integral (15.5-1) by changing variables from (w, v) 
to (r, t). By Theorem III, §15.32, the result is 


1 1 VEG -F 2 dudv = II VEG - F 


d(u, v ) 


Hr, t ) 


drdt. 


R' 


This formula and (15.5-6) give us (15.5^1), and our proof is thus complete. 


15.51 / SURFACE INTEGRALS 

A surface integral is a natural generalization of a double integral, and may be 
used for applications to such things as finding centers of gravity and moments of 
inertia of curved laminas, the potentials and components of force due to 
distributions of electrostatic charge on surfaces, and other quantities having 
physical or geometrical significance. 

The direct intuitive formulation of the concept of a surface integral is as 
follows: Let S be a surface, and let (f>(P ) be a point function defined on S. 
Divide S into a number of surface elements ASj, . . . , AS„, of areas 
AA], . . . , AA n . Let P k be some point in the element A S k . Then the surface 
integral of <f> over S is defined as 


II <j> dA = lim 

s 


£ 


<MP fc ) AA,, 


(15.51-1) 


where the limit is taken as the maximum of the dimensions of the elements AS* 
approaches zero. 

In case the surface S is flat and lies in the xy-plane, this definition takes the 
same form as the expression of a double integral in Theorem II, §13.23. 

If <f>(P ) is expressed as a function F(x, y, z), where (x, y, z) are the co- 
ordinates of P, the surface integral is denoted by 


J J F(x, y, z) dA. (15.51-2) 

s 

The conception of a surface integral is independent of all co-ordinate 
systems. It is also independent of the choice of parametric representation for the 
surface. However, to work with surface integrals, we must learn how to express 
surface integrals as ordinary double integrals. This is done with the aid of 
equations for the surface, whether in parametric form or otherwise. 

Consider first the case in which S is defined by an equation z = /(x, y), 
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where (x, y) ranges over a region R in the xy -plane. In this case 


ii ^ y, z) dA = 1/ F <« ,» z) sec y dx dy, 


(15.51-3) 


where, in the integral on the right, 


= /(x,y),secy=[l + (g) + (§)]'; 


y is the acute angle between the normal to S at (x, y, z) and the positive z-axis. 
The derivation of (15.51-3) rests on the fact that if AS is a small piece of S 
which projects into a small rectangular cell of dimensions Ax by Ay in the region 
jR, then the area A A of AS is given by 

A A = sec y Ax Ay, 

where y is evaluated at some point in AS (see the discussion following (14.6-12) 
in §14.6). 

Example 1. Find the moment of inertia of the hemispherical surface z = 
Va 2 -x 2 - y 2 about the x-axis, assuming the surface to be a homogeneous 
lamina of mass M. See Fig. 155. z 

The required moment of inertia is 


// <r(y 2 + z 2 ) 


where a is the constant density. Thus, by (15.51-3). 


J J a(y 2 + z 2 ) sec y dx dy, 



where R is the circular region x 2 + y 2 ^ a 2 . Now 


Fig. 155. 


sec y = — = 


z Va 2 -x 2 -y 2 


and y 2 + z 2 = a 2 - x 2 . Thus 


I = era 


//v#= 


x 2 - y 2 


dx dy. 


It is convenient to use polar co-ordinates to evaluate this integral. The iterated 
integral is 


T f 2n ^ f a a 2 - r 2 cos 2 6 _ A _ 

f = M J 0 d °J 0 Va^P 
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We leave most of the calculations to the student. 

r f 2 V 3 2a 3 2 4ircra 4 

7 = tra Jo ( a ~x cos e ) de = ^r- 

Since M = lira 2 # , we have I = jMa 2 . 

If the surface is represented parametrically, with parameters (u, v ), the 
appropriate expression for A A is 

A A = Au Ad 

(see the discussion leading up to (14.6-6)), and, if </>(P) is expressed as some 
function f(u, d), we have 


II f(u, v) dA = II f(u, v)VEG -F*du dv, (15.51-4) 

S R 

where R is the region in the uv -plane which corresponds to the surface S. Here 
one should also remember that 

EG-F 2 = fi + jl + jl 

(see (14.6-7)). 

Example 2. Find the electrostatic potential at (0,0, -a) of a uniformly 
distributed total charge e on the hemisphere in Example 1. 

If or is the constant density of charge, the potential in question is 


U II [x 2 +y 2 +(z + a) 2 ] l/2dA 

S 

The student may wish to consult §13.51 to refresh his memory on the concept of 
potential. See (13.51-4), in particular. 

We use the parametric representation x - a sin <J> cos 0, y = a sin </> sin 6, z = 
a cos <f>. In this case, with u = <f>, v = 0, it is readily found that 

ds 2 = a 2 d<t> 2 + a 2 sin 2 </> dO 2 , 

E = a 2 , F = 0, G = a 2 sin 2 

Also 

[x 2 + y 2 + (z + a) 2 ] l/2 = 2a cos y. 

This result may be read directly from Fig. 156, or it may be worked out 
analytically. 

The potential is thus 


n= ff r dA 

s 2a cosy 


J - 2ir r ir/2 

dO 

0 Jo 


a 2 sin <f) 


la cos 


4 > 


d(j). 
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Since sin <f> - 2 sin y cos y-, we have 

r ir/2 i 

u = 2 Traa J sin y d<j> = 27rera(2 - V2). 

Since e = 27ra 2 <x, the result can be written 

(2- V2)e 

u = — — . 

a 



EXERCISES 

1. Locate the centroid of the hemispherical surface z = (a 2 - x 2 - y 2 ) 1/2 , using a 
surface integral. 

2. Let S be the surface defined by x 2 + y 2 + z 2 - 4, z ^ 1. 

(a) Find the value of ff(x 2 + y 2 )zdA, using (15.51-3). 

s 

(b) Find the value of ff [x 2 + y 2 + (z - 2) 2 ] dA , using (15.5 1—4) and the parameters 0, <j> as 

s 

in Example 2. 

(c) Find the value of ff (x 2 + y 2 ) dA. 

s 

3. Calculate f f -~=M^=== where S is defined by 2z = x 2 + 2y, 0 ^ x ^ 1, 0 ^ y ^ 1 . 

J J Vz - y + 1 

s 

4. Compute the value of JJ(x 2 + y 2 -2z 2 ) dA where S is the entire surface x 2 + y 2 + 

s 


5. Compute the value of J J -—=======^ where S is the part of the surface x 2 + y 2 + 

s 

(z - a) 2 = a 2 which is inside the cylinder x 2 + y 2 = ay and underneath the plane z = a. 

6. Find the value of ff xyzdA, where S is the part of x 2 + z 2 = 4 in the first octant 

s 

and between y = 0, y = 1. 

7. Locate the centroid of the area on the surface of the sphere x 2 + y 2 + z 2 = 4a 2 , 
inside the cylinder x 2 +y 2 = 2ax, and in the first octant. 

8. If the surface in Exercise 7 is a homogeneous lamina, find its moment of inertia 
about the z-axis. 

9. (a) If the cone az = b Vx^+ y 1 is parametrized by setting x = r cos 0, y = r sin 0, 
z = brfa, show that EG - F 2 = (a 2 + b 2 )r 2 /a 2 . 

(b) If S is the part of the cone for which 0^ z ^ b, and if S is a homogeneous lamina, 
show that its moment of inertia about the z-axis is iMa 2 , using a surface integral and 
(15.51-4). How does this compare with using (15.51-3) and polar co-ordinates in the 
xy-plane? 

10. On the cylinder x 2 + y 2 = 1, 0^ z ^2 use the 0 and z of cylindrical co-ordinates 
as parameters, and calculate the value of the surface integral giving the electrostatic field 
strength produced at (0, 0, 0) by a uniform density of charge on the cylinder. 
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11. Let S be a part of the surface of the sphere x 2 + y 2 + z 2 = a 2 , and let it carry a 
uniformly distributed total mass M. Show that the gravitational field produced at the 
origin by this mass is F = (M/a 3 ) R, where R is the vector from 0 to the center of mass of 
S. 

12. Suppose that S in Exercise 1 1 lies entirely on the hemisphere 2^0. Show that 
the z-component of the field at 0 is oA/a 2 , where a is the density and A is the area of the 
projection of S on the xy -plane. 

13. If the torus of Fig. 129 (§14.5) carries a homogeneously distributed total mass M, 
show that its moment of inertia about the 2 -axis is iM(2a 2 + 3b 2 ). 

14. Let S be the sphere x 2 +y 2 + 2 2 = a 2 , and let it carry a uniform distribution of 
charge. Show that the field produced at (0, 0, c) is of magnitude 0 if \c\ < a, elc 2 if c > a, 
and {ejc 2 if c = a. The substitution t 2 = a 2 + c 2 -2ac cos </> will be found useful in this 
exercise, as well as in Exercises 15, 16. 

15. Solve the problem corresponding to Exercise 14 if the density of charge is 
( 7 = cos (f > , where 4> is the angle which OP makes with the positive 2 -axis. 

16. Find the potential u at (0, 0, c) of a uniform distribution of charge on the 

hemisphere 2 = (a 2 -x 2 - y 2 ) 1 ' 2 . Then calculate to get the force. 

dc 


15.6 / THE DIVERGENCE THEOREM 

The divergence theorem is a three-dimensional analogue of Green’s theorem in 
the plane (§15.3). The main content of the theorem is the formula 

III (lx + Jy + Jz) dV = Jj( PcoSa+ Q cos / 3 +Rcos y) dA - ( 15 . 6 - 1 ) 

T S 

Here T is a region in three-dimensional space; S is the surface bounding T; P, 
Q, R are functions of x, y, z which are continuous and have continuous first 
partial derivatives in T ; and cos a, cos /3 , cos y are the direction cosines of the 
line normal to S, directed outward from T. 

The hypotheses covering T and S need to be made more precise. In the 
usual applications of formula (15.6-1) we may wish to take T to be one of the 
common solids, e.g., a cube, a sphere, or a right circular cylinder. Or, T might be 
the region contained between two concentric spheres, or the region which is left 
when a doughnut-shaped region is removed from the interior of a large ellipsoid. 
Thus, the surface S of T may actually consist of several detached pieces; in the 
last-mentioned example, S consists of the surface of the ellipsoid and the 
surface of the doughnut (a torus). 

We saw in §15.31 that there are not inconsiderable difficulties in the way of 
proving Green’s theorem in the generality with which it is stated in §15.3. There 
are difficulties of a corresponding kind, but even greater, in connection with the 
divergence theorem. It is not even an easy or brief matter to formulate 
reasonably general conditions on the region T such that (15.6-1) may be shown 
to hold true. However, if T is a sufficiently simple type of region, it is quite easy 
to prove formula (15.6-1). To describe the kind of region we have in mind let us 
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begin with a description of the boundary. Let G be a regular region in the 
xy-plane. Let Z t (x, y) and Z 2 (x, y) be continuous functions defined in G and 
having continuous first partial derivatives there, and such that Z\<Z 2 at each 
interior point of G. Let Si and S 2 be the surfaces defined by z k = Z*(x, y), 
k = 1,2. Consider the cylindrical surface formed by erecting lines parallel to the 
z-axis at points of the boundary of G. Let S 3 be the portion of this cylindrical 
surface which is cut off between the surfaces S t , S 2 . It may happen that Z\ = Z 2 
on the boundary of G; in this event there is no surface S 3 . Now let T be the 
region bounded above by S 2 , below by Si, and laterally by S 3 . (See Fig. 157, page 
486.) A region T formed in this manner will be called xy -simple. In a similar manner 
we may define what we mean by a yz-simple region or a zx-simple region. The 
region defined by l^x 2 +y 2 ^4, O^z^l is xy-simple, but not yz-simple or 
zx-simple. The region defined by x 2 + z 2 ^ y, 0 ^ y ^ 4 is zx-simple. 

We now prove the following lemma: 


LEMMA. Let T be an xy-simple region , and let S be the entire surface of T. 

dF 

Suppose that F(x, y, z) and — are continuous in T. Let y be the angle 

oZ 

between the positive z-axis and the outward drawn normal to S. Then 

III ^z" = j J F cos y ^ (15.6-2) 

T S 

Proof. We begin by expressing the triple integral in (15.6-2) as an iterated 
integral in which the first integration is with respect to z. We have 




T G 

In the z-integration, x and y are held constant. Hence 

z *dF 


L 


Z, dz 

and the triple integral becomes 


dz = F(x, y, Z 2 )-F(x, y, Z,), 


J//a7 dV = If [F(X ’ y ’ Zl) ~ F(x ’ - v ’ Zl) dx dy - (15.6-3) 

T G 

Now consider the surface integral in (15.6-2). On the lateral surface S 3 of T we 
see that cos y = 0, for y = ir/2. Hence 

II F cos y dA = J J F cos y dA + J J F cos 7 dA. 

s s, s 2 

At a point on Si the outer normal extends downward, so that y is obtuse and 
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cos y <0. On S 2 , however, y is acute (see Fig. 157). The surface integrals over 
Si and S 2 can be transformed into double integrals over G by formula (15.51-3). 
In the latter formula y represents the acute angle between the positive z-axis 
and the undirected normal to the surface. Hence since sec(7T — y) = — sec y, we 
have 


Likewise 


Jf F(x, y, z) cos y dA = J J F(x, y, Z t ) cos y sec(7T - y) dx dy 

S, G 

= - JJ F(x, y, Zi) dx dy. 

G 

JJ F(x, y. z, cos y dA = J J F(x, y, Z 2 ) cos y sec y dx dy 

S 2 G 

= J J F(x, y, Z 2 )dxdy. 


Thus we have shown that 


JJ FcosydA = JJ [F(x, y, Z 2 )-F(x, y, Z x )] dx dy. 

S G 

On comparing this result with (15.6-3) we see that the lemma is proved. 

dp 

Clearly, if T is yz-simple, we have a result analogous to (15.6-2) with — 
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instead of — , and cos a instead of cos y; a corresponding result also holds for 

zx-simple regions. No additional proofs are needed, since the results differ from 
(15.6-2) in notation only, and the labeling of the axes is purely a matter of 
notation. 

Now suppose that T is a region which is at once xy-simple, yz-simple, and 
zx-simple, and let S be its surface. Suppose that P, Q, R are functions which are 
continuous and have continuous first partial derivatives in T. At a point where S 
is smooth let n be a unit vector normal to S and extend outward from T, and let 
n make angles a, j3, y respectively with the positive x-, y-, and z-axis. Then by 
the lemma 


l!l£ dv -ll pc ° s ‘ dA ' 

T S 

III f <lv - II Q 

T S 

I/If' <fV.//Rcos,<IA 

T S 

Adding, we obtain the divergence theorem (15.6-1) for a region T of this 
restricted type. 

Next we proceed to remove some of the restrictions on T. Let us call a region 
xyz-simple if it is at once xy-simple, yz-simple, and zx-simple. The region 
between two concentric spheres (say with centers at O) is not xyz-simple. But 
the three co-ordinate planes divide this region into eight parts, each of which is 
xyz-simple. In this subdivision process, certain additional surfaces are intro- 
duced as “interior partitions” in T. Each surface element of such an interior 
partition is on the boundary of two xyz-simple subregions. Let us say that a 
region T is xy z-standard if, by the introduction of a finite number of simple 
surface elements as interior partitions, we can divide T into a finite number of 
xyz simple subregions. 

THEOREM VII. Under the stated assumptions on P, Q, R, formula (15.6-1) 
holds when T is an xy z-standard region. 

Proof. Let T u . . . , T n be the xyz-simple regions composing T, and let S k be 
the entire surface of T k The surface S k may consist partly of pieces of S and 
partly of interior partitions. By what we have already proved, 

/// (lx + + Jz) d v = // (p C0S a + Q C0S 18 + R C0S dA 

T k S k 
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Here we have n formulas. We add them; the sum of the triple integrals is just 
the triple integral over over T. Let us see what we get when we add the surface 
integrals. Suppose that Tj and T k are adjacent regions, and let S jk denote a 
surface element which is an interior partition between T } and T k . If iq is the unit 
vector normal to S ik outward from T h and n k is the unit vector normal to S jk outward 
from T k , then n, = -n k , and so the direction cosines of n, are the negatives of those 
of n k . The result is that, when the surface integrals over Sj and S k are added, the 
integrals over S, k cancel each other out. Thus, when all the surface integrals are 
added, the contributions from all the interior partitions cancel in pairs, and we are 
left with just the integral over the original boundary surface S. This shows that 
formula (15.6-1) holds true. 

The theorem can be written in vector form. Indeed, it is from the vector 
form that the theorem derives its name. 


THEOREM VIII. (THE DIVERGENCE THEOREM.) Let F be a vector point 
function which is defined and continuously differentiable in a bounded closed 
region T. Suppose that , for some choice of rectangular co-ordinate system , T 
is an xy z- standard region. Let S be the surface of T, and n the unit outer 
normal vector to S. Then 


J J J div F dV — JJ F • n AA. (15.6-^t) 

T S 

This form of the theorem requires a few additional words of proof. Let 
P, Q, and R be the x, y, and z components, respectively, of F. Then 


,. dP dQ dR 

dlV F = — H + -r— , 

dx dy dz 

F • n = P cos a + Q cos )3 + R cos y 


(15.6-5) 

(15.6—4) 


so that (15.6-4) and (15.6-1) are equivalent. Now div F and F • n are independent 
of the way in which the rectangular co-ordinate system is chosen. Hence, if 
(15.6-4) is true for one choice of the co-ordinate system, it is true for all choices. 
This shows us that T need not be an xyz-standard region for all orientations of 
the axes; if it is an xyz-standard region for some orientation, that is sufficient. 

The restriction to xyz-standard regions is not absolutely necessary; it is 
imposed in order to make the proof simple. Considerations similar to those 
mentioned in §15.31 will enable one to prove Theorem VIII for certain regions 
which are never xyz-standard for any choice of axes. We shall not discuss these 
generalizations, however. 

Example 1. Let S be the surface of a region T for which the divergence 
theorem is applicable. Let O be any fixed point in space, and let P(x, y, z) be a 
variable point of S. Show that the volume of T is given by 


V = I J J r cos i dA y 


(15.6-7) 
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where r is the distance OP and if/ is the angle between the directed line OP and 
the outer normal to S at P. 

To show this, take P = x, Q = y, R = z in (15.6-5). Then from (15.6-4) we 
have 

fjf 3dv -fi r cos if/ dA, 

T S 

which gives (15.6-7) at once. 

The divergence theorem is useful in connection with the consideration of 
solid angles. For an exposition of this matter we begin with a definition. Let S be 
a surface element, and O a fixed point not on S. Assume that S is not intersected 
more than once by any ray from O, and that no such ray is tangent to S. As a 
point P varies over S, consider the point Q in which the ray OP (extended, if 
necessary) intersects the unit sphere with center at O. These points Q fill out a 
certain portion of the surface of the unit sphere. The area m of this portion is 
defined to be the solid angle subtended by S at O. 

Example 2. Show that the solid angle is given by 

<•> ~ J J dA ~ J J ~~T~ dA, (15.6-8) 

s s 

where r is the distance OP and n is the unit normal to S at P in such a direction 
that the angle if/ between n and OP is acute. 

To derive (15.6-8) we use the divergence theorem as follows: Construct the 
sphere with radius a and center O. Choose a so small that S lies entirely outside 
this sphere. Draw all the rays joining O to S, and let X be the portion of the 
sphere pierced by these rays. Let T be the solid region formed by the bundle of 
rays cut off between X and S (see Fig. 158). For F we take the vector function 
OP x y z 

-p- with components p» p- We apply the divergence 
theorem. Since r 2 = x 2 + y 2 + z 2 , it is readily calculated that 

“ d dlvr *°- 

thus 

JJJ (div F) dV = 0. 

T 

Therefore, by (15.6-4), the integral of F • n over the entire 
surface of T is equal to zero. Here n refers to the outer 
normal. The surface of T consists of X , S, and the lateral 
surface formed by the rays joining O to the edge of S. 
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Since F has the same direction as OP, F • n = 0 on this lateral portion, so that the 
surface integral over this portion is zero. Thus 

J J F • n dA + J J F • n dA = 0. (15.6-9) 

1 s 

The integral over S in this equation is the same as the surface integral in 
(15.6-8). Thus we have to show that 

f f F • n dA = — <o. (15.6-10) 

X 

This is easy, however. On X the outer normal points away from T, which is 
toward O. But F points away from O, and has magnitude 1/r 2 . Thus, since r= a 
on X, F • n = - 1/a 2 , and the integral in (15.6-10) is 

-■p JJ dA = -p (area of 2). 

X 

If a = 1, the area of X is w, by the definition of the solid angle. But, in any case, 

~2 (area of X) = co, 

since the area of X is proportional to a 2 when we vary a. This completes the 
proof of (15.6-10) and (15.6-8) 

EXERCISES 

1. Use (15.6-7) to show that the volume of any cone or pyramid is gBh, where h is 
the altitude and B is the area of the base. Take O at the vertex. 

2. If S is the hemispherical surface z = (a 2 - x 2 ~ y 2 ) 172 , show that the centroid of the 
solid hemisphere is given by 


2= d?// z2coS7<M ’ 

s 

where y is the acute angle between the normal and the positive z-axis. 

3. Use the divergence theorem to show that 



cos a + y 2 cos /3 + z 2 cos y) dA = 


8-t ra 4 
3 ’ 


if S is the surface x 2 + y 2 + z 2 = 2az and a, 0, y are direction angles of the outer normal. 

4. What is the value of the integral in Exercise 3 if S is the surface of the cube 
bounded by x = 0, x = a, y = 0, y = a, z = 0, z = a? 

5. Let F= rOP, where r is the distance OP. Use the divergence theorem to show 
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that 


j j F-ndA = 4 j j'J rdV. 

S T 

By evaluating the surface integral, show that 


JJJ(x 2 +y 2 + zY 2 dV = irb 4 

T 

if T is the spherical region x 2 +y 2 +z 2 ^b 2 . Check by direct evaluation of the triple 
integral. 

6. Let F be the vector function (x 2 + y 2 + z 2 )(xi + yj + zk). If T is the region 
x 2 +y 2 +z 2 ^b 2 , calculate JJ7(V ■ F) dV directly, and also by the divergence theorem. 

Observe that the value of the surface integral can be written down by inspection, since 
F • n = (x 2 + y 2 + z 2 ) 3/2 on S. 

7. Show that, in the notation of (15.6-1), 



+ y 2 )(x cos a + y cos /3 ) dA = 4I Z , 


where T is regarded as a homogeneous mass of unit density and I z is its moment of 
inertia about the z-axis. 

8. If F=xi-yj + z 2 k, calculate //F*ndA, where S is the entire surface of any 

s 

right circular cylinder of radius b with one base in the plane z = 1 and the other base in 
the plane z = 3. 

9. Let S be the ellipsoidal surface 


^2 + ^5 + ^= 1 . 


If (x, y, z) is a point on the ellipsoid, and D is the distance from the origin to the plane 
tangent to the ellipsoid at (x, y, z), show that D -1 = F • n, where n is the unit outer normal 
to S at (x, y, z), and F is the vector with components x/a 2 , y/b 2 , z/c 2 . Hence, show that 


ii 


1 j a 4 /be 


be . ca_ 
b 



10 . Consider a limiting process in which a region T shrinks down onto a certain fixed 
point. If V is the volume of T, show that the divergence of a vector function at this point 
is given by 


div F = lim ~ J J F • n dA. 

s 

This formula is sometimes used as an invariant definition of the divergence. It is useful 
for calculating div F in terms of various systems of curvilinear co-ordinates. See Exercise 

11 . 
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11. Let p, 0, <f> be the usual spherical co-ordinates, and let F p , F e , F* denote the 
components of a vector function F in the p, 0, directions respectively. Show that 

div f = Up (p2 sin * Fp) + £ (pFe) + 4 (p sin 'M- 

Obtain this by using the result in Exercise 10, using for T a region of the “volume 
element” type in spherical co-ordinates, with pairs of opposite faces corresponding to 
values p and p + A p, 0 and 0 + A 0, (f> and (f> + A$. Then let Ap, A0, and A$ approach zero. 
Hint: The volume of T is approximately p 2 sin (f> Ap A0 A<f>. The surface integral over 
the pair of faces corresponding to p and p + Ap is approximately equal to 

[F p (p + Ap, 0, <f>)(p + Ap) 2 sin </> — F p (p, 0, (f>)p 2 sin $] A0 A</>, 

and analogous results can be written down for the other pairs of faces* 

This procedure can be adapted to any system of orthogonal curvilinear co-ordinates. 
The reasoning can be made quite precise with the aid of mean-value theorems. 


15.61 / GREEN S IDENTITIES 

Among the important consequences of the divergence theorem are certain 
integral formulas known as Green’s identities. These formulas play a significant 
role in connection with many of the partial differential equations of applied 
mathematics. The first identity is 


f f f / a , du dv , du dv , du dv\ 

JJJ + 


ll u Si dA ’ 


where we have used the notation 



d 2 v | d 2 v 
dy 5 dz 2 


The expression Av is called the Laplacian of v. In the surface integral, dldn 
refers to the directional derivative in the direction of n, the outer normal to S. It 
is assumed that T is a region to which the divergence theorem applies, and that u 
and u, together with the first derivatives of u and the first and second derivatives 
of v , are continuous in T. 

To prove the correctness of (15.61-1), choose 


^ dv ~ dv 0 dv 

p = M to’ Q = u a? r = u Tz 


in the divergence theorem. Then 


dP | dQ | dR 
dX dy dZ 


is the expression under the triple integral in (15.61-1). On the other hand, 


P cos a + Q cos P + R cos y = u -r— 
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because 

dv dv dv dv 

— cos a + — cos /3 + — cos 7 - — ’ 
dx dy ^ dz 1 dn 

Thus (15.61-1) is just a particular case of the divergence theorem. 

The identity can be written in a more compact form by using vector 
notation. If v is a scalar function, and Vu is the gradient of r, we recall that the 
component of Vt> in any particular direction is the directional derivative of v in 
that direction. Hence, at a point on S 

dv 

Vr n = - — 
dn 

Likewise, we observe that the divergence of Vt> is the Laplacian of v: 

V • (Vv) = Av, 

Since V ■ (Vu) is conventionally written as V 2 v , this latter notation is often used 
in place of At) for the Laplacian of v. We may now write (15.61-1) in the form 


jjj (uV 2 v + Vu -Vv)dV = JJ uVv ■ n dA. (15.61-2) 

T S 

This is the particular case of (15.6-4) in which F= uVv. 

We refer to (15.61-1) or (15.61-2) as Green's first identity. 

Green’s second identity is 

JJJ (uAv-vAu)dV = Jf (ug-»g)<M, (15.61-3) 

T S 

or, in a different notation, 

jjj ( uV 2 v - t >V 2 u) dV = j j(uVv - vVu ) • n dA. (15.61-4) 

T S 

This is deduced from the first identity by exchanging u and v and subtracting. 
The second identity presumes that both u and v have continuous second 
derivatives in T. 


EXERCISES 

1 . Show that 

J//P.dV — A 

T S 

2. If u is a function such that V 2 u = 0 in T, show that 


//■£ 


dA. 
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3. By putting F = Vu in Exercise 11, §15.6, we obtain the expression for the 
Laplacian in spherical co-ordinates: 


1 

p 2 sin 4> 


T . , d j 2 du\ 

L s,n M p T P ) 


1 dh 
sin <p d6 : 



4. Show that the Laplacian can be expressed independently of all co-ordinate 
systems in the form 


v2 “= iin 4J7fH A ’ 

s 

where the limit is taken as the surface S enclosing the volume V shrinks down on the 
point at which the Laplacian is evaluated. 


15.62 / TRANSFORMATION OF TRIPLE INTEGRALS 

We shall now take up the problem for triple integrals which corresponds to the 
problem of §15.32 for double integrals. We suppose that the two sets of variables 
(x, y, z), (u, v, w) are connected by certain equations 

x = f(u,v,w ), y = g(w, v, w), z-h(u,v 9 w), (15.62-1) 


and that these equations establish a one-to-one mapping of a region in upw-space 
onto a region in xyz-space. The functions /, g, h are assumed to be continuous, 
with continuous first and second partial derivatives. We write 


J(u, v , w) = 


<Hf, & h) 

d(u, v , w)’ 


and assume that J has a constant sign. Let T be a closed and bounded region 
which is in the region of xyz-space which is being mapped, and let T' be the 
corresponding region of uvw- space. We assume that T and T' are regions to 
which the divergence theorem applies. 


THEOREM IX. Let V be the volume of T. Then 

V = jJJ \J(u, v , w ) | du dv dw . (15.62-2) 

T 


Proof. The proof is similar to that of Theorem II, §15.32, but the calculations 
are somewhat more complicated. Let us begin by considering the surfaces S, S' 
of T and T' respectively. Suppose S 0 is a smooth simple surface element on S, 
and that So is the corresponding element of S'. We assume that So is represented 
parametrically in terms of certain parameters s, t. The correspondence set up by 
the mapping (15.62-1) gives us a representation of So in terms of these same 
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parameters. Let us write 


. _ 3(y, 2 ) . _ 3(z, x) . _ d(x, y) 


d(s, t ) 


J2 = 


d(s, t) 


J3 


d(s, 0 


_ d(v, w) 

J ' a(s,t) 


Also, let 

We first show that 


= d(w, u) _ 3 (h, d) 

12 3(s,t)’ 13 d(s,t) 

d = vlT+jf+jl, d' = VJF+TJT ]p. 


. _ 3(/,g) 3(/,g) 3(/,g) 

13 d(v,w) ], + d(w,u) j2 d(u,v) h ‘ 


There are similar formulas for ji, j 2 , but we shall not need them. 
By (15.62-1) we see that 

dx _ df du t df dv t df dW 
ds du ds dv ds dW ds ’ 

with similar formulas for 

dt ds dt 

In the interests of compactness in display, let us write 
df 


dU 


fu 


and so on. Then we have 


J3 = 


dx 

dx 


ds 

dt 


dy 

dy 


ds 

dt 



df x du du 

Tv= i2 ' Ts = u ” dF = “ 2 ’ 


/lUi + f 2 Vi + f 3 Wl flU 2 + flX>7 + / 3 W 2 
glU, + g 2 V X + g 3 Wi g]U 2 + glV 2 + g 3 w 2 


(15.62-3) 


There are theorems on determinants by which the foregoing determinant may be 
expanded in the form 


J*3 = 


f 2 

h 

Vl 

V2 + 

h 

/ 1 

Wi w 2 

-f 

/. 

f 2 

M, U 2 

82 

8 3 

Wi 

w 2 

g 3 

81 

Mi M 2 


gi 

82 

Vl v 2 


This latter form is equivalent to (15.62-3). 

It was shown in §14.4 that the direction cosines of the normal to S 0 are given 
by 

cos a = ’ cos /3 = cos 7 ^ (15.62-4) 

In an arbitrary parametrization these may or may not refer to the outward 
direction of the normal, but since an exchange of the parameters 5 , t changes the 
sign of each of the Jacobians ji,j 2 ,j 3 , we may (and shall) assume that the 
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parameters have been chosen so that (15.62-4) give the direction of the outer 
normal to So. The direction cosines of the outer normal to So' are 


cos a' = 




cos Y' = ±t^ 


(15.62-5) 


where the sign may be + or — , but is the same in all three equations. As we shall 
see, the sign depends upon the sign of J(u , v , w). 

Now, by the divergence theorem with P = Q = 0, R = z, we have 


We show that 


V = j J j dx dy dz = JJ 


z cos y dA. 


T 


S 


II 


z cos y dA = 


± JJh(u,v,w)[j&& cos a’ + 


a(/,g) 

d(w, u) 


cos /3' + 


9(f,g) 

d(u, v ) 


cos y 


■} 


dA, 


(15.62-6) 


where the choice of sign is the same as in (15.62-5). It is enough to carry out the 
demonstration for each pair of corresponding surface elements S 0 , So- By 
(15.51-4) and (15.62-4) we see that 


f f z cos ydA= ( ( (z cos y)D ds dt 


( f 




S 0 R R 

where R is the region in the 51 -plane corresponding to the element S 0 . The last 
integral may be transformed into a surface integral over So (again by (15.51-4)): 


Jf zj 3 ds dt = J J jy dA. 

R S6 


We now express the integrand in terms of u , v, w , using z = h(u, v, w ), (15.62-3) 
and (15.62-5). The result is the integrand on the right in (15.62-6), and thus the 
latter formula is established. 

The next step is the application of the divergence theorem in th uvw- space, 
with 


P = h 


d(f,g) 

d(v, wY 


Q = 


h W,8) 

d(w,u)’ 


R = h 


d(u, v) 


A calculation shows that 


dP t dQ t dR _ d(f, g, h) 

dU dv dw d(u , V, w) 


= J(u , V , w); 
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there are certain second-derivative terms, but they cancel each other out. Thus, 
by (15.62-6) and the divergence theorem, 


V = ± J J J J(u, v, w) du dv dw. 


This is equivalent to (15.62-2), for J is of constant sign, and V is of course 
positive. 

It follows from the foregoing that the positive signs are to be chosen in 
(15.62-5) when J >0, while the negative signs are to be chosen if J <0. 


With the aid of Theorem IX we can prove the following three-dimensional 
analogue of Theorem III, §15.32: 


THEOREM X. If F(x, y, z) is continuous in T and if the mapping meets the 
requirements stated prior to Theorem IX, we have 


j j j F< ' x ’ y ' z ) dx d y dz = /// p ( x < y> z ) 


d(x, y, z) 


d(u, v, w) 


du dv dw. 


T T 

where , on the right , F(x, y, z) is to be expressed as a function of u , v, w by 
means of (15.62-1). 

The proof is entirely analogous to that of Theorem III. 


EXERCISES 

1. Compute the Jacobian J(u, v, w) for the transformation 

x = p sin (f> cos 0, y = p sin <j> sin 6, z = p cos <j>, 

where u = p, v = <J>, w = 6, and compare Theorem X for this case with the known results 
about evaluating triple integrals in terms of spherical co-ordinates. 

2. Consider the transformation 

x = u + 2v + 3w, y=4M + 5t), z = 6u. 

(a) If R is a region in xyz-space, and R ' is the corresponding region in uuw-space, what is 
the volume of R if the volume of R ' is 10? 

(b) If the centroid of R is at (9, 45, 30), find the centroid of R'. 

3. Use the transformation x = au, y = bv, z= cw to calculate the triple integral 
1 1 f (x 2 + y 2 ) dx dv dz, where R is defined by (x 2 la 2 ) + (y 2 /b 2 ) + (z 2 /c 2 ) ^ 1. The trans- 

R 

formed integral may be evaluated by use of spherical co-ordinates in wiw-space. 

4. Consider the transformation 

V V . 2 

x - — cos w, y = — sin w, z = v . 

Let R be the region between the paraboloids z = x 2 + y 2 , z = 4(x 2 -I- y 2 ) and also between the 
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planes z = 1, z = 4. Use the transformation to calculate the integral 

j jj (x 2 + y 2 ) dx dy dz. 

R 

5. Consider the transformation 


z = u - v. 


Show that the inverse transformation is 


y 

u = x + y + z, i> = x + y, w = x' 

Let R be the region in the first octant of xyz-space under the plane x + y + z = 2 and 
directly above the trapezoid in the xy-plane bounded by the lines x + y = 1, x + y = 2, 
y = 0, y = x. Show that the corresponding region R ' in uuw-space is a prism with two 
faces perpendicular to the w-axis and three faces parallel to that axis. Then evaluate 


fjf?i+yl dxd y dz 

R 

by transforming to an integral over R'. 

6. Use the transformation 

x = u(l-u), y = uv(\-w), z = uvw 

to calculate 

fffxdxdydz and fff~^ 

R R 

where R is the tetrahedron cut from the first octant by the plane x + y + z = 1. 

7. If curvilinear co-ordinates u, v, w are introduced into the first octant by the 
transformation 

x = v cos w, y = v sin w, z = \/ u ~ v 2 , 

describe the surfaces on which u, v, w respectively are constant'. Describe the region R' 
in umv-space which corresponds to the region R described as follows: 

9 g x 2 + y 2 + z 2 ^ 16, 1 ^ x 2 + y 2 ^ 4, x ^ 0, y 0, z ^ 0. 


Calculate fff zdxdy dz with the aid of the transformation. 

R 

8. Let curvilinear co-ordinates be introduced into the first octant by the trans- 
formation 

x = v, y = vv, z = (u - v 2 - w 2 ) 1/2 ; 

show that, for small values of A u, Av, Aw, the volume in the first octant bounded by the 
planes x = v, x = v + Au, y = w, y = w + Aw, and the spheres x 2 + y 2 + z 2 = «, x 2 + y 2 + 
z 2 = u + Au is approximately 

Am Av Aw 
2V u - v r - w 2 
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9. If jR is the region defined by x 2 + y 2 + z 2 ^ 1, and p 2 = a 2 + b 2 + c 2 > 0, show that 



cos(ax + by + cz ) dx dy 


dz = 


4 7T 
P 7 


(sin p - p cos p). 


Hint: Make a rotation of the co-ordinate axes so that the plane ax + by + cz = 0 
becomes the plane z' = 0. Then use cylindrical co-ordinates. 

10. Suppose each point (x, y, z) moves according to the law 

x = £(l + f 2 ), y = z = le 2 \ 

so that the point is at (£, rj, £) when t = 0. If the components of velocity are computed in 
terms of x, y, z we find 


dx - 2tx 

~dt~ 2t ^~ TTP’ 


»y 

Tt = ne = y, 


|=2^ = 2z. 


so that the velocity is the vector 


F = I ^pi+yj + 2zk. 

Now let Ro be any fixed region in ^^-space, and let R be the region in xyz-space into 
which Ro has been carried at time t. If V is the volume of R, show that 


f =///<-» 

R 


dx dy dz. 


Actually, the validity of this equation does not depend on the particular equations of 
motion. 


11 . Show the last formula in Exercise 10 holds for the case in which the equations of 
motion are 


i + & 


y = V + U z = £e '. 


In this case the velocity vector is F= -x 2 i + j — zk, and it is to be assumed that £ > 0 in 
Ro. 


15.7 / STOKES’S THEOREM 

Stokes’s theorem is the formula 


fl[(§- S) cosa+ (§-f ) cos/3 

s 

+ cos 7 J dA = J P dx + Q dy + R dz. (15.7-1) 

This formula has an important vector interpretation, given in (15.7-6). 
Stokes’s theorem plays a role in three dimensions much like that of Green’s 
theorem in the plane (§15.3). Both the divergence theorem and Stokes’s theorem 
are valuable tools in mathematical physics. We use Stokes’s theorem in §15.8. 
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In formula (15.7-1) S denotes a surface bounded by a curve C; P, Q, and R 
denote functions of x, y, z. The normal to S is constructed at each point, and a 
certain direction is chosen along each normal so that all these directions point 
away from S on the same side of S. This is called the positive side of S. The 
angles a , /3, y refer to the direction of the normal at (x, y, z) on S. The curve C is 
oriented in such a way that, if a person walks along C in the 
positive direction, standing on the positive side of S , the 
surface S is always on his left. If S is a simple surface 
element, this means that the orientation of C appears as 
counterclockwise to a person standing on the positive side 
of the surface (see Fig. 159). A more exact description of the 
assumptions which we make about the surface S and the 
functions P, (?, R will be given later on. 

For the particular case in which S is a region in the xy-plane, Stokes’s 
theorem is identical with Green’s theorem in the plane; for in that case, if we 
choose the positive side of S as that toward the positive z direction, we see that 
cos a = 0, cos /3 = 0, cos y = 1. Therefore, since dz = 0 along C in this case, 
(15.7-1) becomes 



//(§-§ + 

s 

and this is Green’s theorem in the plane. 

We shall now prove formula (15.7-1) for the case in which S is a simple 
surface element. If we pick out of (15.7-1) just those terms which involve P, we 
have the formula 


// i^ C0SP ~~d jc°sy) dA = J r pdx - (15.7-2) 

s 

There are two other similar formulas, one involving Q and one involving R. We 
shall prove (15.7-2). The other two formulas will then be established by cyclic 
permutation of the letters in the sets (P, Q, P), (x, y, z), (a, j3, y). Combination of 
the three formulas will then give us (15.7-1). 

We shall assume that the surface element S is represented parametrically by 
a one-to-one mapping 

x=/(w, t>), y = g(u,v), z = h(u, v), 

the parameters ( u , t>) ranging over a region G in the uv- plane. We assume that G 
is a regular region bounded by a simple closed curve T which is sectionally 
smooth. We also assume that the functions /, g, h have continuous first and 
second derivatives in G and that the Jacobians 

. _ d(y , z) . d(z,x) . _ d(x, y) 

Jl d(u,v)’ 2 d(u,v)’ i3 d(u,v) 
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are never all zero at once. Let T be oriented counterclockwise in the uu-plane, 
and let C be oriented so that as (u, u) goes along T in the positive sense, (x, y, z) 
goes along C in the positive sense. Let 

d = Ul+il+jl) m . 

Then for the direction of the normal toward the positive side of S we have 

cos a = ±j~> cos P = cos y = (15.7-3) 

where the same choice of sign is to be made in each equation. We shall show 
that the + sign must be chosen. Select any point of S not on C. At this point j h 
j 2 , and j 3 are not all zero, and we shall suppose for definiteness that j 3 ^ 0. Then 
near this point j 3 always has the same sign. Also, the normal near the point is 
never perpendicular to the z-axis and so cos y is always of the same sign. The 
problem is to show that the sign of cos y is the same as that of j 3 . Let Co be a 
small closed curve on the portion of S we are considering. Let To be the 
corresponding curve in the uu-plane, and let Co be the projection of C 0 in the 
xy-plane (see Fig. 160). Let the orientations of C 0 and To agree with those of C 
and T respectively, and let the orientation of Co be that which is naturally 
induced by the orientation of C 0 * Now the equations x = f(u , r), y = g(u, v) may 
be regarded as defining a mapping from the uu-plane to the xy-plane. Under this 
mapping To is mapped into Co. The Jacobian of the mapping is j 3 , so, by the 
discussion in §15.32, the orientation of Co will be counterclockwise (as viewed 
from the positive z-axis) if j 3 > 0, and clockwise if j 3 <0. A little consideration of 
the relation between Co and Co then shows that cos y > 0 if j 3 > 0 and cos y <0 
if j 3 <0. The case j 3 >0 is illustrated in Fig. 160. This completes the demon- 
stration that the + signs are to be chosen in (15.7-3). 

We are now ready to proceed with the proof of (15.7-2). In terms of the 
parameters u , v the surface integral becomes 

II (§ COS P~jI COS 7 )d , du dv = If (f h - ~ h) du dv. 

G G 



Fig. 160. 
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Here we have used (15.51-4) and (14.6-7). Next we show that 

dP . dP. = dP dx dP dx 

dz ^ 2 dy du dv dv dll 

In fact, 

du dx du dy du dZ du’ 
dP 

with a similar formula for — . Therefore, 


(15.7-4) 


dP dX _ dP_ dx _ dP /dy dx dy dx \ dP / dz dx dz d* \ 
du dv dv du dy \du dv dv du) dz \du dv dv duj 

and this is equivalent to (15.7-4). 

The surface integral in (15.7-2) has now been reduced to the form 



dP dx 
du dv 


G 


dP dx\ 
dv du) 


du dv. 


An easy calculation shows that this is the same as 


//[s^ £)-£('’ £)]«•*’■ < ,5 - 7 - 5 » 

G 

To this integral we now apply Green’s theorem in the ut>-plane. As a result, 
(15.7-5) is equal to the line integral 


f P^du + P 

Jr du 


dx 

dv 


dv. 


But this is just 

L pdx ' 

and so we have completed the proof of (15.7-2) under the assumptions on S as 
stated earlier. We have assumed that P, Q, and R have continuous partial 
derivatives in some region containing S. 

Stoke’s theorem may be extended to more general surfaces by a process 
entirely similar to that employed in the proof of Green’s theorem in the plane. 
The process is suggested by Fig. 161. It consists in dividing S into a finite 
number of simple surface elements by the construction 
of one or more “cuts,” or interior dividing lines. We 
assume that each element and its boundary takes its 
orientation from the overall orientation of S and C, 
and that each “cut” occurs as part of the boundary 
of just two surface elements, with opposite orienta- 
tions in the two cases. If now we add the formulas 



Fig. 161. 
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of Stokes’s theorem for the several elements, the contributions from the cuts to the 
line integrals cancel each other in pairs, and we obtain (15.7-1) as the final result. 

Notice that it is taken for granted that there is an orientation of the boundary 
C of S, and a corresponding orientation of S itself, that is, a designation of a 
positive side of S. The assumption that such orientation of S is possible is a 
restriction placed on S, for there are surfaces which cannot be oriented, as we 
shall presently show. 

The concept of orientation of a surface may be developed in the following 
way: We limit our discussion to surfaces which may be thought of as formed out 
of a finite number of simple surface elements in such a way that if two elements 
have an arc of common boundary, then no other element has any sub-arc of this 
arc as part of its boundary. Moreover, we require that if two elements have any 
points in common, the common points shall form a finite number of arcs on the 
boundary of each element. A simple surface element is oriented by assigning a 
positive direction to the simple closed curve forming its boundary. If S is a 
surface formed out of elements in the manner described, and if each element is 
given an orientation, we shall say that the elements are coherently oriented if the 
following two conditions are fulfilled: 

(a) If T is an arc of common boundary between two elements Si, S 2 , then the 
orientation of T in S, is opposite to its orientation in S 2 ; 

(b) If T is a simple closed curve forming part of the boundary of the whole surface 
S, then the orientations given to various arcs of T by the several elements of S 
shall all be consistent and given an orientation to T as a whole. 

If the elements forming S can be coherently oriented, we shall say that S is 



Fig. 162 . 
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orientable. Otherwise, S is said to be nonorientable. These definitions apply to 
both closed and nonclosed surfaces. When S is orientable we can designate a 
positive side of S by designating a positive side to each element in accord with 
the orientation of the boundary of the element, as explained near the beginning of 
this section. But if S is nonorientable we cannot designate a positive side of S, 
for there will be two adjacent elements of S for which the designations of the 
positive sides will be in conflict as we pass from one element to the other. A 
nonorientable surface is in fact one-sided , whereas an orientable surface is 
two-sided. The simplest nonorientable surface is the Moebius band, represented 
in Fig. 162. The student can easily make a model of this surface by cutting a long 
narrow strip of paper, and gluing the two ends together after giving one end a 
half twist. The Moebius band has a single simple closed curve C as boundary. 
Stokes’s theorem may be given a vector form, as follows: 

THEOREM XI. Let S be an orientable surface as described above, formed from 
smooth surface elements with sectionally smooth boundaries . Let F be a 
vector field which is continuously differentiable in some open set containing 
S. Let n be the unit normal vector on the positive side of S. Then if S is not 
closed , and if C is the boundary of S, 



• n dA = 


L 


F T ds. 


(15.7-6) 


where T is the unit vector tangent to C in the positive sense . If S is closed, the 
surface integral in (15.7-6) has the value zero. 


The form (15.7-6) is equivalent to (15.7-1) with F= Pi + Qj + Rk, since, by 
(14.3-1), 


T = 


dx dy 
ds 1 ds 


j + 



If S is closed, the line integrals around the boundaries of the elements 
forming S all cancel out, because S has no boundary in this case. Hence, the 
surface integral over S must vanish. 


EXERCISES 

1. Let C be the curve of intersection of x + y = 2b and x 2 + y 2 + z 2 = 2b(x + y), 
oriented in the clockwise sense as viewed from the origin. Use Stokes’s theorem to find 

the value of j y dx + z dy + x dz. 

2. Let S be the part of the surface z = x 2 - y 2 inside the cylinder x 2 + y 2 = 1, and let 
the positive side of S be such that y is acute. Show that, if C is the boundary of S, 



s 


cos a - y cos /3) dA = 


JL 


xy dz 
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is a special case of Stokes’s theorem. Calculate the values of the line integral and the 
surface integral independently, without use of Stokes’s theorem, and verify that they are 
equal. Observe that C may be parametrized by setting x = cos 0, y = sin 0. 

3. Calculate the value of f c y dx + z dy + x dz where C is the curve of intersection of 

bz = xy and x 2 + y 2 = a 2 , oriented counterclockwise around the cylinder as viewed from a 

point high upon the positive z-axis. Use Stokes’s theorem and then convert to a double 
integral over a circle in the xy-plane. 

4. Let C be the curve in which the cylinder x 2 + y 2 = a 2 is intersected by a plane 
parallel to the jc-axis, and let C be oriented counterclockwise around the cylinder as 
viewed from a point high on the positive z-axis. 

(a) Show that f c z(x 2 - 1) dy + y (x + 1) dz = 0. 

(b) Show that f c y (z - 1) dx + x(z + 1) dy = lira 2 . 

(c) If the plane is y + z = a, show that f c z dx - x dz = 2ira 2 . 

(d) What is the value of the integral in part (c) if the plane is z = 2 (a + y)? 

5. Calculate fc y 2 dx + xy dy + zx dz , where C is the curve of intersection of x 2 + y 2 = 
2ay and y = z. 

6. Let S be the first octant portion of the surface of the sphere x 2 +y 2 +z 2 =4a 2 
which is inside the cylinder x 2 + y 2 = 2 ax. Let the outer side of the sphere be the positive 
side of S. Calculate the integrals 

(a) j zdx- x dz, (b) J xdy-y dx, (c) J y dz - z dy, 

where C is the boundary of S, oriented as in Stokes’s theorem. Observe that, by suitable 
use of projection in evaluating the surface integrals which result from using Stokes’s 
theorem, each line integral is equal to twice the area of an appropriate plane region. 

7. If S is a simple surface element whose equation is z = f(x, y), with {x, y) varying 
over a plane region R, let C be the boundary of S and T the boundary of R. Prove 
Stokes’s theorem for S by converting the line integral over C to a line integral over T and 
the integral over S to an integral over R . Then use Green’s theorem in the plane. 


15.8 / EXACT DIFFERENTIALS IN THREE VARIABLES 

An expression such as 

Pdx + Qdy +Rdz (15.8-1) 

is called a first-order differential form in three variables. Here P, Q, R denote 
functions of x , y, z. In this section we consider exact differentials and line 
integrals independent of the path. 

Definition. The differential form (15,8-1) is said to be exact at the point (a, b, c) if 
there is some single-valued differentiable function u = f(x, y, z) defined in some 
neighborhood of (a, b , c) such that 

du = P dx + Qdy + R dz 


at all points of the neighborhood. 
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The concept of an exact differential can be expressed in vector language, 
using the notion of the gradient of a scalar function. Let F be a vector field. It is 
called a gradient field at a particular point in case there is a scalar differentiable 
function u defined in some neighborhood of that point, with the property that F is 
the gradient of u in that neighborhood. In terms of the rectangular xyz-coordinate 
system, suppose that 

F=Pi + Qj + Rk. (15.8-2) 

Then Vu = F means that 


or, equivalently, that 


dx F ’ dy dz R ’ 


du = P dx + Q dy + R dz. (15.8-3) 

Therefore , the differential form (15.8-1) is exact at a point if and only if F is a 
gradient field at that point. 

We now state a theorem about line integrals independent of the path. 


THEOREM XII. Suppose that P , Q, R are continuous in some region , and that in 
this region there is defined a single-valued differentiable function u = 
f(x, y, z) such that (15.8-3) holds at all points of the region. Then, if C is any 
sectionally smooth curve in the region , with initial point (x 0 , yo, z 0 ) and 
terminal point (xi, yi, Z\), the equation 

Jpdx + Qdy-hRdz = f(x h yi, Zi)-/(x 0 , yo, z 0 ) (15.8-4) 

holds. Therefore the line integral is independent of the path from (x 0 , yo, Zo) to 

(x u yi,zi). 


This theorem corresponds to Theorem V, §15.4, and is proved in the same 
way. 

Our next theorem corresponds to Theorem IV, §15.4. 


THEOREM XIII. Let D be an open region in the shape of a rectangular box , 
each face of which is parallel to a co-ordinate plane . Suppose P , Q, R are 
defined and have continuous first partial derivatives in D. Then , in order that 
there shall be a single-valued differentiable function u = f(x , y, z) defined in D 
such that du = P dx + Q dy + R dz at each point, it is necessary and sufficient 
that 


dR_dQ = o <^2-^ = 0 

dy dz ’ dz dx ’ dx dy 


(15.8-5) 


at each point. Such a function can then be found by calculating the line 
integral of the differential form from an arbitrary fixed point (x 0 , yo, z 0 ) to the 
variable point (x, y, z) along any path lying in D. 
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Proof. The conditions (15.8-5) are necessary, for if the function u exists we 


have 


d 2 u 


d 2 u 


dz ’ dy dy dz dz dy 


dR 80 

and so — = — ; similar arguments show that all the equations (15.8-5) must be 
satisfied. 

Let us now start with the assumption that equations (15.8-5) hold in D. 
Choose any fixed point (x 0 , yo, z 0 ) in A and let (x u y u z i) be any other point in D. 
Let C be the straight line segment from (jk 0 , yo, z 0 ) to (x b y b z x ) and define 


/(*,, y,, zi) = J P dx + Q dy + R dz. 


(15.8-6) 


This gives us a definition for the value of the function at each point of D. It 
remains only to show that 


*/_n *L-o —— R 
ax~ p ’ ay" Q ’ dz~ R - 


(15.8-7) 


Now suppose that (a, b, c) is any third point of D, and let C b C 2 be the 
segments as shown in Fig. 163. We shall show that 


f(x i, y u Zi) = f(a, h, c) + f Pdx + Qdy + Rdz. 

Jc 2 

This is equivalent to showing that 

!rL;L 


(15.8-8) 


(with P dx + Qdy + Rdz understood as appearing under each integral sign). This 
in turn is equivalent to showing that 


J P dx + Q dy + Rdz = 0 


(15.8-9) 


where T is the oriented path all the way around the triangle in one direction. If S 
is the plane triangular region bounded by T, we see that (15.8-9) holds by virtue 
of (15.8-5) and Stokes’s theorem (15.7-1). 



Fig. 163. 
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Let us now consider x x as a variable, and y h z t as fixed, and let us take C 2 
parallel to the x-axis, so that b = y u c = Z\. Then (15.8-8) becomes 


zd = f(a, y u 



F(x, yu Z X ) dx. 


From this equation it is clear that 


df(x u y u zi) 

dX\ 


= P(x u y u Zx). 


We have thus proved the first formula in (15.8-7). The other two are proved in 
the same way. This completes the proof of Theorem XIII. It will be observed 
that, for the proof we have given, the open region D need not be a rectangular 
box, provided it has the property that any two points of D can be joined by a 
line segment lying wholly in D. Such a region is called convex. 


The substance of Theorems XII and XIII is capable of succinct statement in 
vector notation. Theorem XII says that 

j F • T ds = f(x i, yi, 2 ,) - /(x 0 , y 0 , Zo) 


provided F = V/ in the region where C lies. Theorem XIII says that F is the 
gradient of some function / in the region D if and only if V x F = 0 in A For, 
with F defined by (15.8-2), equations (15.8-5) mean precisely that V x F = 0. 

Example . Of what function is 

(y 2 + 2z 2 x — 1) dx + 2yx dy + 2zx 2 dz 


the differential? 

This differential form satisfies the conditions (15.8-5) at all points. We shall 
compute a function /(x, y, z) by integrating from (0, 0, 0) to (x, y, z) along a 
broken line as shown in Fig. 164. We use (r, 5, t ) as 
co-ordinates of a variable point along the path. Then 


/(x, y, z) = j (s 2 4- 2 t 2 r - 1) dr + 2 sr ds + 2 tr : 

= f -dr + f 2 sx ds + f 2 tx 2 dt, 

Jo Jo Jo 


dt 


/(x, y, z) = -x + y"x + z 2 x 2 . 



The differential of this function is the originally given Fig 164 
differential form. 

Theorem XIII is not true for all open regions D. Consider, for example, the 
differential form 


ydx + xdy + 2z 

+ y 
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where D is all of space except the z-axis. In this region the equations (15.8-5) 
are satisfied, and yet there is no single-valued function defined in D whose 
deferential is equal to the foregoing differential form. The explanation lies in 
Example 1, §15.41. The essential trouble is that, if a closed curve C in D 
encircles the z-axis, it is impossible to find a surface S in D, with C as a 
boundary, to which we can apply Stokes’s theorem. 

To get away from difficulties of this kind we need to develop a notion 
which for space of three dimensions is analogous to the notion of a simply 
connected region in two dimensions, as explained in §15.41. We shall not attempt 
any systematic developments of this kind, however. 


EXERCISES 

1. In each of the following cases determine, by inspection or otherwise, a function 
whose differential is the differential form under the integral sign. Then use Theorem XII 
to evaluate the line integral over the specified path. 

(a) J yz dx + zx dy + xy dz , from (1, 2, 3) to (4, 5, 6); 

(b) J 2 xy 2 z dx 4- 2x 2 yz dy + x 2 y 2 dz, from (0, 0, 0) to (a, b, c); 

(c) J 2xyz 3 dx 4- x 2 z 3 dy + 3x 2 yz 2 dz, from (1, 1, 1) to (p, q, r); 

<i> L yz cos(xz) dx + sin(xz) dy 4- xy cos(xz) dz, from (tt 2 I4, 1, 2 fir) to (tt/2, -2, 3). 

2. Let u = x/r 2 where r 2 = x 2 + y 2 + z 2 , and let F = Vu. Compute J C F-T ds, where C 
is any curve from (1, 0, 0) to (a, b, c ) not passing through the origin. 

3. Find a function whose differential is 2xy dx 4- (x 2 4- log z) dy 4- (y/z) dz, by integrat- 
ing along a path from (0, 0, 1) to (x, y, z). Solve the problem in several ways: (a) Use a 

broken-line path from (0, 0, 1) to (x, 0, 1) to (x, y, 1) to (x, y, z); (b) use a broken-line path 

from (0, 0, 1) to (0, 0, z) to (0, y, z) to (x, y, z); (c) use the straight-line path from (0, 0, 1) 

to (x, y, z). 

4. Find a function whose differential is 


2x 


-dx + 2l 
z z 




by integrating over some path from (1, 1, 1) to (x, y, z). 

5. Find a function whose differential is 


e _xy [(y-*y 2 +yz) dx + (x -x 2 y 4-xz) dy - dz], 
by integrating over some path from (0, 0, 0) to (x, y, z). 

6. Let C be the circle x 2 4- y 2 = a 2 , z = 0. If P(x, y, z) is 
any point, let P ' be the point (x, y, 0), and let Q be the point in 
which the ray OP' (produced if necessary) intersects C. Let 
(j> be the angle between the directions OQ and QP (see Fig. 
165), with the convention that 0 < <f> < it if z > 0, -it < <j> < 
0 if z < 0, and, if z = 0, that (f> = 0 if x 2 -I- y 2 > a 2 and (f> = tt if 
x 2 + y 2 < a 2 . This leaves <j> undefined at points of the circle 
C, but it is defined everywhere else. 


z 



Fig. 165. 
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(a) Show that, in general, 

^ - xzdx - yzdy ^ Vx* + y 2 - a 

Vx 2 + y 2 [(Vx 2 + y 2 - af + z 2 ] (Vx 2 + y 2 - a) 2 + z 2 

(b) At what points is <f> discontinuous? 

(c) At what points is (f> continuous but not differentiable? 

(d) If D is the region consisting of all of space except the z-axis and the circle C, show 
that the differential form in (a) satisfies conditions (15.8-5) in D. 

(e) Describe a closed curve in D which is not the boundary of any surface lying in D and 
to which Stokes’s theorem may be applied. 


MISCELLANEOUS EXERCISES 

1. If S is defined by z = /(x, y), with (x, y) ranging over R, and if the positive side of 
S is chosen so that cos y > 0, show that 

JJ (x cos a - y cos p) dA = JJ (y f“ * |£) dx ^ 

S R 

2. Find a function whose differential is 

(e x cos y dx - ( e x sin y - 7 sec 2 y) dy. 

\ Vl-x / 

3. Suppose a >0, b >0. Let P be a point of intersection of y 2 = -4a 2 (x - a 2 ) and 
y 2 = 4b 2 (x + b 2 ), and let R be the region bounded by the first parabola, the x-axis, and the 


line OP. Show that 


it 


_ 2 a \y by using the transformation x = u 2 - u 2 , y = 2 uv. 

Vx + y 


4. Use (15.4-6) with a = b = 0 to find a function u such that 

, 4xy dx + 2(1 4- y 2 - x 2 ) dy 
du= (x 2 + y 2 - l) 2 4-4y 2 ■ 

The function so found has certain discontinuities. Where are they? 

5. Find a region in the r0-plane which is mapped, by x = r cos 9, y = r sin 9 , into the 
region R between x 2 +y 2 =l, x 2 +y 2 =4, and inside x 2 +y 2 =2x. Hence calculate 

ISvWp dxdy - 

R 

6. Find a first-quadrant region in the uu-plane which maps into the region R defined 

by 1 ^ x 2 + y 2 ^ 4, y ^ 0, if the mapping is x = u 2 - v 2 , y = 2 uv. Calculate J J by 

R 

transforming to the mu - plane, and check your result by calculating the given integral in 
terms of polar co-ordinates. 

7. Calculate J J cos dx dy ’ w ^ ere ^ the tr i an gi e bounded by x = y, 

R 

x + y = 0, and x - 2y = 2. Use the transformation u = 2x - y, v = x - 2y. 
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8. Let S be the surface of the elliptic cone (x 2 /a 2 ) + (y 2 lb 2 ) = (z 2 /c 2 ) between z = 0 
and z = c. Show that the centroid is at (0, 0, fc). Show that both the area and the first 
moment of S can be expressed in terms of the same complete elliptic integral of the second 
kind. For a hint as to a useful procedure, see Exercise 5, §15.32. 

9. Show that the potential at O due to a uniform charge on the surface of the torus 
x = (a + b cos 4 >) cos 0, y = (a + b cos 4>) sin 0, z = b sin 4> is expressible in terms of the 
standard elliptic integrals 



Vl - k^sin 5 ^) d4>, 


rir/2 

■L 


d4> 

V 1 - k* sin^ 


where 


k 2 = 


4 ab 

(a + b) 2 


10. Find the area of the part of the surface (x 2 + y 2 + z 2 ) 2 = x 2 - y 2 in the first octant. 



16 / POINT-SET THEORY 


16 / PRELIMINARY REMARKS 

This chapter is a continuation of the study of limits and convergence. Chapter 2 
may be regarded as part of the theory of point sets on a straight line. In Chapter 
5 we began some study of point sets in a plane. Point-set theory is fundamental 
in the study of functions. Our primary aim in this chapter is to develop enough 
theory to deal with our later study of continuous functions, integration, and the 
theory of convergence of sequences and series. The principal theorems are the 
Bolzano-Weierstrass theorem, Cauchy’s convergence condition, and the Heine- 
Borel theorem. 

16.1 / FINITE AND INFINITE SETS 

A set of points is defined by some condition which distinguishes the points which 
belong to the set from those which do not belong to it. The points which belong 
to a set are called elements of the set. In the study of limit concepts we are often 
obliged to deal with sets having an infinite number of elements. A set is called a 
finite set if there is some positive integer n such that the set has exactly n 
elements. 

Sometimes we may define a set of points by a condition which is such that 
no points satisfy the condition. In this case we call the set empty, or void. The 
empty set is considered to be a finite set, the number of its elements being 0. 

A set which is not finite is called infinite. 

Example 1. Let S be the set of all points (x, y) in the xy-plane which are 
such that x and y are integers and x 2 +y 2 <100. This is a finite set. Without 
determining exactly how many elements there are in S, we can easily see that the 
number does not exceed (19) 2 = 361. 

Example 2. Let S be the set of all points (x, y) inside the circle (x - 2) 2 + y 2 = 
4 and such that y >x + 1. This set is void, for the circle lies entirely below the 
line y = x + 1. 

Perhaps the simplest infinite set is the set whose elements are the positive 
integer points 1, 2, 3, . . . on the real number scale. Let us, for the time being, 
denote this set by the letter I. 

Definition . A point set S is called denumerable , or denumerably infinite , if there is 
a one-to-one correspondence between the elements of S and the elements of the 
set I of positive integers. 

The existence of a one-to-one correspondence between the elements of S 
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and those of I means that each point of S can be labeled with a unique positive 
integer n, with no two distinct points having the same label n, and that every 
positive integer is used as the label for some point. The elements of S can then 
be symbolized by Pi, P 2 , P 3 , . . . . 

The words enumerable and countable are synonyms for denumerable. 

Infinite sets may be classified into those which are denumerable and those 
which are not. It can be shown that the set consisting of all the real numbers is 
nondenumerable, whereas the set consisting merely of the rational numbers is 
denumerable. The demonstrations of these facts may be based on Examples 3 
and 4, below. 

If a set is infinite, we can select out of it a subset which is denumerable. This 
simple but important fact is used in many arguments. 

Example 3. The set S of all rational numbers x such that 0 < x < 1 is 
denumerable. 

To demonstrate the denumerability of S, let us make a display as follows: 

11111 

2 3 4 5 6 

2 2 2 2 2 _ 

3 5 7 9 11 

3 3 3 3 J3_ 

4 5 7 8 10 


The numerators in the successive rows of the display are 1, 2, 3, . . . . The 
denominators in each row are increasing, but such that each fraction is proper 
and in its lowest terms. A little reflection shows us that each member of S 
appears just once in this array, and that each member of the array is an element 
of S. By labeling the elements Xu x 2 , *3, x 4 , . . . as shown in the following array, 
we see that S is denumerable. 

X\ X 2 x 4 X 7 Xu ‘ ' 

//// 
x 3 X 5 *8 Xn 



X\0 


Example 4. Let S be the set of all nonterminating decimal fractions of the 
form 0.aia 2 a3 . . . , where a 1 , a 2 , . . . represent digits 0, 1, . . . , 9, and the case in 
which all the u’s from some point onward are 0 is ruled out. This set is 
nondenumerable. 

If S were denumerable, we could label all its members by a double subscript 
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scheme as follows: 

1st element: 0.ana 2 ia 3 i ■ • • 
2d element: 0.ai 2 a 22 a 32 ’ • • 
3d element: O.a^a^a^ * * • 


This leads to a contradiction, however, for there are elements of S not included 
in the foregoing list. In fact, consider the decimal O.CiC 2 c 3 . . . , where the c’s are 
chosen from 0, 1, . . . , 9 according to the rules: c x ^ an, c 2 ^ a 22 , c 3 ^ a 33 , . . . , and 
each Ci is different from 0. This decimal belongs to S, but is different from each 
decimal in the proposed enumeration of the elements of S. Therefore, S is 
nondenumerable. 


16.2 / POINT SETS ON A LINE 

In dealing with point sets on a line we shall assume that a number scale has been 
established on the line. We shall usually think of it as the x-axis. Then each point 
is identified by its x-co-ordinate, and we shall refer to a point by referring to the 
corresponding real number. 

Definition. By a neighborhood of a point x 0 we mean the set of all points x such 
that xo~ h <x <x 0 + h, where h is some positive number. The neighborhood thus 
consists of all points between x 0 - h and x 0 +h, not including these two points. 
For each choice of h we get a neighborhood of x 0 . 

This definition should be compared with the definition of a neighborhood in 
the discussion of point sets in the xy-plane; see §5.1. 

Definition . A point set S is called open if for each point x 0 of S there is some 
neighborhood of x 0 which belongs entirely to S. 

Example L If a < b, the set S of all x such that a < x < b is an open set. It is 
called an open interval. 

Example 2. The set of all rational numbers r such that 0 < r < 1 is not open. 
This is because any neighborhood of any points contains both rational and 
irrational numbers. 

Example 3. Let S be the set of all points on the. x-axis except the points 1, % 
3, i, . . . . This set is not open, for 0 is in the set, but no neighborhood of 0 lies 
wholly in the set. 

Definition. In speaking of point sets on a line , if S is a set , the complement of S is 
defined as the set of all the points on the line which are not in S. The complement 
of S is denoted by C(S). 
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For instance, if S is the set of Example 1, C(S ) consists of all x such that 
x ^ a, together with all x such that x ^ b. 

Definition . A set is called closed if its complement is open . 

Example 4. If a < b , the set of all x such that a ^ x ^ b is a closed set. It is 
called a closed interval. 

It is very important to notice that, in our use of “open” and “closed” as 
adjectives describing sets, “ closed ” is not the opposite of “ open .” If a set is not 
open, that does not mean that it is closed. 

Example 5. The set S of all x such that 0<x ^ 1 is neither open nor closed. 
For, no neighborhood of 1 lies in S, which shows that S is not open. And no 
neighborhood of 0 lies in C(S), which shows that C(S) is not open, and hence 
that S is not closed. 

The entire x-axis is an open set, by the definition of openness. The 
complement of this set is the void set. Also, the complement of the void set is 
the entire x-axis. It is convenient to agree, by convention, that the void set is 
open. This makes the entire x-axis a closed set. Thus, the entire x-axis is both 
open and closed, and the same is true of the void set. There are no other sets on 
the line which are both open and closed, however. 

One of the most important concepts in this chapter is that of an ac- 
cumulation point of a set. 

Definition . Let S be a point set , and let y be a point which is not necessarily in S. 
We call y an accumulation point of S if in each neighborhood of y there is at 
least one point x which is in S and distinct from y. 

In some books the term limit point is used instead of accumulation point. 

A finite set can have no accumulation point, for if S is finite and y is any 
point, it is easily seen that if a small enough neighborhood of y is selected, the 
condition of the definition cannot be fulfilled. An infinite set may have no 
accumulation points, but it may also have one or several or infinitely many. 
There is no requirement that an accumulation point of S shall belong to S, but it 
may happen to belong to S in a particular case. 

Example 6. Let S be the set of all rational numbers. Every point on the 
number scale is an accumulation point of S. 

This follows from the fact, demonstrated in §2.5, that there is a rational 
number between any two distinct numbers. As a consequence, if Xo is any 
number, every neighborhood of Xo contains many rational numbers, so that x 0 is 
an accumulation point of S. 

In the definition of y as a point of accumulation of S, the requirement of the 
definition actually makes it necessary for each neighborhood of y to contain 
infinitely many points of S. This is seen as follows: Let I x be some neighborhood 
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of y , and let X| be a point of S in 1 i, with X\ ^ y. Now choose a neighborhood I 2 
of y, small enough so that X\ is not in I 2 . There must be a point x 2 of S in I 2 , with 
x 2 7 ^ y. We then repeat the argument, getting smaller and smaller neighborhoods 
J 3 , I 4 , . . . , and points x 3 , x 4 , . . . all in S such that, for each k, x k is in I k but not in 
Ik+i, and x*^ y. The infinite sequence of distinct points belongs to the original 
neighborhood I h 

When the full import of the foregoing paragraph has been realized by the 
student, he or she will not find it difficult to recognize points of accumulation in 
the situations which come to his or her attention. 

There is an important relation between the concept of a closed set and the 
notion of accumulation points, as we see in the following theorem: 

THEOREM I. A set is closed provided that each accumulation point of S is a 

member of S. Conversely , if S is closed , it contains each of its points of 

accumulation. 

Proof. Suppose every accumulation point of S belongs to S. If S were not 
closed, its complement would not be open. But, if C(S) were not open, this 
would mean that C(S) contains a point y which has no neighborhood lying 
entirely in C(S). But if a neighborhood of y fails to lie wholly in C(S), it must 
contain a point of S, and since y is in C(S), the point of S cannot be the point y. 
Thus, if C(S ) were not open, it would contain a point y whose every neighbor- 
hood contains a point of S, and thus y would be a point of accumulation of S. 
This would contradict the supposition that every accumulation point of S 
belongs to S. Consequently, C(S) must be open, and S closed. This proves the 
first assertion in Theorem I. 

For the converse we suppose S is closed. If y is any accumulation point of 
S, we have to show that y is in S. If it were not, it would be in the complement 
C(S), which is open. Then some neighborhood of y would also lie in C(S). Such 
a neighborhood could not contain any points of S, and so y could not be an 
accumulation point of S. Thus we arrive at a contradiction unless we conclude 
that y is in S. This completes the proof. 

EXERCISES 

1. For each of the following sets answer the question: Is the set open, closed, or 
neither? 

(a) All x such that x < 1. (d) All rational numbers. 

(b) All x such that x ^ 0. (e) AH irrational numbers. 

(c) All x such that either x <0 or x§ 1. 

2. If S is the set of all x such that 0^=x^l, what points, if any, are points of 
accumulation of both S and C(S)? 

3. (a) If S is the set of all x such that 0 <x < 1, are there any points of S which are 

points of accumulation of C(S)? (b) Are there any points of C(S) which are points of 

accumulation of S? 

4. Prove that any finite set is closed. 
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5. Prove that, if S is open, each of its points is a point of accumulation of S. 

6. For each of the following sets find the points of accumulation and specify whether 
or not the set is dosed. 

(a) All numbers of the form {n - 1 )/n, n = 1, 2, 3, ... . 

(b) All numbers of the form n}(n 2 + 1), n = 0, 1, 2, 3, ... . 

(c) All numbers of the form (- 1 ) n nl(n 4- 1), n = 0, 1, 2, 3, ... . 


16.3 / THE BOLZANO-WEIERSTRASS THEOREM 

A point set S on the x-axis is called bounded if there is some finite interval 
which contains all of S. In other words, S is bounded if there exist numbers a, b 
(with a <b) such that a ^ x ^ b for every x in S. 

THEOREM II. Suppose that S is a bounded , infinite set. Then there is at least 

one point of accumulation of S. 

This theorem is generally known as the Bolzano-Weierstrass theorem. 
Bernard Bolzano (1781-1848) of Prague was a pioneer in the rigorous study of 
point sets and other matters fundamental in analysis. Karl Weierstrass (1815— 
1897) was one of the great mathematicians of the nineteenth century 

Proof. For the proof we appeal to the theorem about nested intervals 
(Theorem VI, §2.8). We assume that S lies in the closed interval [a, b], and we 
denote this interval by h. Now divide I\ into two equal parts by the midpoint 
(a + b)/2, and consider the two closed intervals [a, (a + b)/2], [(a + b)/2, b]. 
Since every point of S lies in one or the other of these intervals, at least one of 
them must contain an infinite number of points of S. Let I 2 denote such a one of 
the two. We then bisect I 2 and obtain a new closed interval I 3 . By repetition of 
the process we generate a nest of closed intervals {J„}. By the manner of 
construction I n contains infinitely many points of S, and the length of I n is 
(b ~ a) 1 2 n_1 . By the theorem referred to earlier in this paragraph, there is exactly 
one point common to all the intervals of the nest. This point, call it z, is an 
accumulation point of S, for any given neighborhood of z will contain I n if n is 
sufficiently large. This is true because z is in I n and the length of I n approaches 0 
as n increases. The neighborhood must therefore contain an infinite number of 
points of S. 

We shall use the Bolzano-Weierstrass theorem as a basis for much of the 
later work in this chapter. 

EXERCISES 

1. Suppose S is a set having the number M as its least upper bound. If M is not a 
member of S, show that it is a point of accumulation of S. Give an example showing that, 
if M does belong to S, it need not be a point of accumulation of S. For the definition of 
least upper bounds see §2.7. 

2. For an alternative proof of Theorem II, let two classes L, jR be defined as follows: 
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A number x belongs to L if there are infinitely many elements of S to the right of x on 
the scale; otherwise x belongs to R. Show that L and R form a cut (see §2.4), and that the 
cut number is a point of accumulation of S. 

16.31 / CONVERGENT SEQUENCES ON A LINE 

The notion of limit of a sequence has been discussed briefly in §1.62. The 
student should review this earlier section at this time. Our present purpose is to 
develop some relations between the concept of a convergent sequence and the 
notion of a point of accumulation of a set. 

If {x„} is a sequence of real numbers, we can consider the set consisting of 
all the distinct points on the number scale corresponding to the successive 
numbers x u x 2 , x 3 , .... It is to be emphasized that the set thus defined is not the 
same thing as the sequence. A point set is just a collection of points; a sequence 
{x n } is a function defined on the positive integers, with x n the value of the 
function corresponding to the particular positive integer n. 

If {x n } is a sequence, the set of values x b x 2 , x 3 , . . . may be a finite set. This is 
true in the example 

x„ = l-(-l)", 

where the set of values consists of the two points 0, 2. If the set of values x b x 2 , 
x 3 , ... is a finite set, the sequence will be convergent if and only if there is some 
integer N such that all the values x„ are the same when n^N. This is easily 
seen by direct consideration of the definition of convergence (§1.62). The case in 
which the set of values is infinite is taken up in the theorem which follows: 

THEOREM III. Suppose {x„} is a convergent sequence of real numbers , and 
suppose that S, the set of distinct points among the values x\, x 2 , x 3 , . . . , is an 
infinite set. Then S has just one point of accumulation , and this point is the 
limit of the sequence. 

Proof. Denote the limit by x. Consider any neighborhood of x, say the open 
interval from x - e to x + e, where e >0. By the definition of convergence there 
is some integer N such that x n is in the above neighborhood if n ^ N. Since S is 
infinite, there must be infinitely many distinct points of S represented among the 

values x N , x N+u x N+2 , These are all in the specified neighborhood of x, and so 

x must be an accumulation point of S. There can be no other accumulation point. 
For, if y^ x, choose neighborhoods I x and I y of x and y, respectively, which are 
small enough so that no point belongs to both I x and I y . Then all but a finite 
number of points of S are in I x , whereas if y were an accumulation point, an 
infinite number of points of S would have to be in I y . This completes the proof 
of Theorem III. 

The following theorem is useful: 

THEOREM IV. Let S be any point set having an accumulation point y. Then 
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there is a sequence {x n } such that lim n x„ = y, and such that each x n is an 

element of S and the values x h x 2 , . . . are all distinct. 

Most of the argument needed to prove this theorem has already been given, 
in the second paragraph following Example 6, §16.2. All that is necessary is to 
specify that the intervals I h I 2 , I 3 , . . . be chosen so that the length of I n 
approaches 0 as n^>™. This will insure that x n -> y. 

We conclude this section with some considerations of subsequences . If {x n } 
is a sequence, a subsequence is another sequence {y k } formed by dropping out 
certain of the x n ’s and retaining the rest in the order originally given. If the 
indices retained are n x , n 2 , n 3 , . . . , where n x < n 2 < n 3 < • * *, then y x = x ni , y 2 = x„ 2 , 
and so on. As particular examples of subsequences we cite: 

x 2 , x 4 , x 6 , x 8 , . . . ( n k = 2/c), 

xi,x 2 , x 6 , x^, . . . (n k = k !). 

If {x„} is a sequence, convergent or not, we may consider a subsequence and 
inquire as to its convergence. A sequence may be divergent and yet contain a 
convergent subsequence. Indeed, it may contain many convergent subsequences, 
each with a different limit. 

If a sequence is convergent, every subsequence formed from it is also 
convergent, and they all have the same limit, namely the limit of the original 
sequence. 

There are many uses of the following theorem: 

THEOREM V. Let {x„} be a bounded sequence. Then {x„} contains a convergent 

subsequence. 

Proof. Let S be the set of distinct points represented by the values x u 
x 2 , . . . . There are two cases to consider, according as S is finite or infinite. If S 
is finite, it has no accumulation point. But some point of S must be repeated an 
infinite number of times in the sequence, so that for some subsequence we must 
have x„, = x„ 2 = x„ 3 = • • •. This subsequence is certainly convergent, the limit 
being the repeated value. If S is infinite, it must have at least one accumulation 
point, say y, by the Bolzano-Weierstrass theorem. But then, by Theorem IV, y is 
the limit of a sequence of distinct terms chosen from S. This sequence might not 
be a subsequence of {x n }. For example, it might be x 10 , x 5 , x 20 , x 30 , x 25 , . . . , where 
the indices are not arranged in increasing order. But, by dropping out the terms 
whose indices are not in the natural order we get a subsequence (x 10 , x^, 
x 30 , ... in the foregoing example), and this subsequence will converge to y. This 
completes the proof. 

EXERCISES 

1. Suppose that {x„} is a sequence which is bounded and such that all the values Xi, 
x 2 , x 3 , . . . are distinct. Assume that the set of these values has just one point of 
accumulation, denoted by x. Prove that the sequence is convergent and that the limit is x. 
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2. Consider the sequence with terms 2, 3, i i i . • ■ where x n = 3[1 - (-1)"] + (l/n). 

Find two convergent subsequences with different limits. 

3. Give an example of a sequence for which the set of values X\, x 2 , . . . is finite and 
there are three different convergent subsequences with distinct limits. 

4. If all the terms of a sequence are distinct, and the set of values has just one point 
of accumulation, can the sequence fail to be convergent? Consider the case where 
x n = n[l + (-l) n ] + (lM). 

16.4 / POINT SETS IN HIGHER DIMENSIONS 

In §5.2 we defined neighborhoods for points in the xy-plane. Then we went on 
to define open sets, closed sets, and some other concepts. It was indicated that 
similar definitions can be made for point sets in three-dimensional space, starting 
from the definition of spherical neighborhoods. 

If 5 is a point set in the plane, we define an accumulation point of S in 
exactly the same way as we did for point sets on the line: A point Q is called an 
accumulation point of S if each neighborhood of Q contains at least one point P 
which is in the set S and distinct from Q. This same definition is also used for 
point sets in three-dimensional space. 

When we analyze and compare the fundamentals of point-set theory for point 
sets in 1, 2, or 3 dimensions, we observe the following things: (1) The verbal 
definition of an open set is the same in all three cases, but the word “neighbor- 
hood” has to be given the interpretation appropriate to the dimensionality. (2) 
The verbal definition of a closed set, namely, that S is closed if C(S) is open, is 
the same in all three cases. (3) Theorem I of §16.2, which states that a set is 
closed if and only if it contains all its points of accumulation, is valid for point 
sets in the plane or in space as well as for point sets on a line. The proof as given 
in §16.2 does not depend upon the dimensionality. 

It must be recognized that a point set on the x-axis may be open when it is 
considered as a point set on the line, but that it will not be open when considered 
as a point set in the xy-plane. This is because of the different meanings of the 
word “neighborhood” in the theories for one and two dimensions, respectively. 
Likewise, for example, the set of all points in the xy-plane for which x 2 + y 2 < 1 
is open in the theory for two dimensions, but is not open when we regard it as a 
point set in the three-dimensional space of points (x, y, z). 

We now turn to the Bolzano-Weierstrass theorem for higher dimensions. 
The statement is exactly the same as that already given in §16.3 (Theorem II). 
Bounded sets in the plane or in space were defined in §5.3. To prove the theorem 
for sets in the xy-plane we shall first discuss a generalization of Theorem VI, 
§ 2 . 8 . 

Let R u Ri , . . . , R n , . * * be a sequence of closed rectangular regions, with 
sides parallel to the co-ordinate axes and such that R 2 is contained in R 1 , is 
contained in R 2 , and so on. Furthermore, let the dimensions of R n approach 0 as 
n -»oo. We shall call such a sequence of rectangles a nest. By an extension of the 
argument used in proving Theorem VI, §2.8, it is easy to see that if R n is any nest 
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of closed rectangles, there is one and only one point which is common to all the 
rectangles. 

Now let S be a bounded infinite point set in the xy-plane. Since S is 
bounded, it is contained in some sufficiently large closed rectangle with sides 
parallel to the axes, say R h defined by c^y^d\. Now divide 1*! 

into four equal smaller rectangles, by the lines x = (ai + bi)/2, y = (cj + d\)l2. In 
at least one of the four closed smaller rectangles there must be an infinite 
number of points of S. Call such a one of the smaller rectangles R 2 . The same 
procedure is now applied to R 2 . By continuation we obtain a sequence R h 
R 2 , . . . , R n , . . . forming a nest of closed rectangles, and, for each n, there are an 
infinite number of points of S in R n . Let P be the unique point belonging to all 
the rectangles. This point P is easily seen to be an accumulation point of S. 

This method of proof may be suitably extended to apply to point sets in 
three dimensions. 


16.41 / CONVERGENT SEQUENCES IN HIGHER DIMENSIONS 

The concept of a convergent sequence may be extended to the case of 
sequences whose terms are points in space of higher dimensions. 

Definition. Let P n be a sequence of points. We say that the sequence is con- 
vergent to the limit P, and write lim n ^ x P n = P, if each neighborhood of P 
contains all but a finite number of the points Pi, P 2 , P 3 , . . . . 

For some purposes it is convenient to rephrase this definition in terms of 
distances between points. Let d(P, Q) denote the distance between P and Q. 
This distance can be expressed in terms of the co-ordinates of P and Q , but just 
now it is more convenient to deal directly with the points and not with the 
co-ordinates. The meaning of lim„^oc P n = P is the following: If e >0, there is 
some integer N such that d(P n , P) < e if N ^ n. 

From this definition it can easily be shown that, if {P„} is a sequence in the 
xy-plane, with co-ordinates ( x n , y„), and if P is the point (x, y), then P„ converges 
to P if and only if x n converges to x and y„ converges to y. A like situation 
obtains in three dimensions. 

The relations between convergent sequences and points of accumulation are 
essentially the same in higher dimensions as in the case of points restricted to 
the x-axis. The Theorems III, IV, V of §16.31 all remain true in higher 
dimensions. The proofs as given in §16.31 remain valid with only a few minor 
modifications in notation to suit the higher dimensional cases. 

EXERCISES 

1. Let {P„} be the sequence ([1 - (- l)"]/n, [1 + (- l)"]/n). Is it convergent? 

2. Let S be the set of points on the curve y = sin(l/x) (x 0) in the xy-plane. (a) Find 

a sequence of points P„ on the x-axis and in S such that P„ converges to (0, 0). (b) Find a 
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sequence of points P n on the line y = 1 and in S such that P n converges to (0, 1). (c) What 

points must be adjoined to S in order to get a closed set? 

3. Let S be the set of points (x, y) such that x is rational and O^x ^ 1 while y is 

irrational and 0<y < 1. Is S open, closed, or neither? Describe the totality of points of 

accumulation of S. 


16.5 / CAUCHY’S CONVERGENCE CONDITION 

The definition of convergence of a sequence is stated in a way which involves 
the limit of the sequence. Therefore one cannot use the definition to prove that a 
sequence is convergent unless one already knows what the limit is. This is often 
inconvenient, for it frequently happens that we want to prove that a sequence is 
convergent, even if we do not know precisely what the limit is. The following 
theorem provides a way out of the difficulty. 

THEOREM VI. Let {P n } be a sequence of points. Let d(P m , P n ) denote the 
distance between P m and P n . A necessary and sufficient condition that the 
sequence be convergent is that d(P m , P„) approach 0 as m and n become 
infinite . In other words , for each e > 0 there must be some N such that 
d(Pm, Pn) < e whenever N m and N ^ n. 

We note that in the case of a sequence {x„} of points on the x-axis, the 
distance is 

d(x m , x„) = |x m -x„|, 

so that the condition becomes 

|x m x n | < 6 if N^m and N ^ n. 

This condition on the sequence is called Cauchy's condition (see the reference 
to Cauchy in §1.61). 

Proof that the condition is necessary. If P n P as n oo, and e > 0, choose 
N so large that d(P„, P)<e/2 if N ^ n. This is possible, by the definition of 
convergence. If now m ^ N and n ^ N, we have d(P m ,P n ) = 
d(P m , P) + d(P, P n ) < (e/2) + (e/2) = e, for the three points P m , P„, P lie either on 
a straight line or at the vertices of a triangle, and in either case the distance 
between one pair is not greater than the sum of the distances between the other 
two pairs. For points on a line we could write 

|x m x n | = |x m “ x j + |x *x n |, 

which is an application of the inequality (2.2-9) with a = x m - x, b = x - x n . 

Proof that the condition is sufficient. We now assume that Cauchy’s 
condition is satisfied by the sequence {P n }. First we shall prove that the sequence 
is bounded. Taking e = 1, Cauchy’s condition assures us that there is some N 
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such that d(P my P n ) < 1 if m and n are not less than N. In particular, d(P N , P n ) < 
1 if n ^ N. This means that all points of the sequence with the possible exception 
of Pi, ... , P N - 1 are less than unit distance from P N . Certainly, then, there is 
some neighborhood with center at P N large enough to contain all the points of 
the sequence, and the sequence must be bounded. 

We now apply Theorem V, §16.31 (the extension of this theorem to higher 
dimensions was discussed in §16.41). The sequence {P„} contains a convergent 
subsequence P„,, P„ 2 , .... Let the limit of this subsequence be P. We shall 
complete the proof by showing that the original sequence is convergent to the 
limit P. Suppose e > 0. Choose N so that d(P m , P„) < e/2 if m^N and n ^ N 
(Cauchy’s condition). Since the subsequence converges to P, we can choose one 
of the indices n t so large that d(P„., P) < e/2. We can also, at the same time, 
choose it larger than N. Then, if n ^ N, we have 

d(Pn, P ) S d(P„, PJ + d(P ni , P) <| + | = 6, 

by the way things have been arranged. Here we have used the same kind of 
inequality involving the distances between three pairs of points that we encoun- 
tered in the necessity argument. The proof of Theorem VI is now complete. 

Theorem VI is most often used as a tool for carrying on theoretical 
developments. 

EXERCISE 

Let a sequence be defined on the x-axis as follows: Xi = 1, x 2 = 2, x 3 = =[(xi + x 2 ), and 
in general x n+ i = 5 (x n _i + x„), n = 2, 3, . . . . Show that |x m - x„| ^ 1/2 N_1 if IV ^ m and 
N ^ n, so that Cauchy’s condition is fulfilled. Suggestion: Note that each term is 
midway between the two preceding terms. 


16.6 / THE HEINE-BOREL THEOREM 

In this section we are concerned with an important theorem of point-set theory 
which is widely known by the names of the two mathematicians Eduard Heine 
and Emile Borel. The central idea of this theorem probably originated in 
connection with the concept of uniform continuity. Heine defined this concept, 
and proved the very important theorem that if a function of one variable x is 
continuous at each point of a closed interval a then it is uniformly 

continuous on that interval. We shall discuss the concept and the theorem in 
Chapter 17, at which time we shall need to appeal to the Heine-Borel theorem. 
You will doubtless find difficulty at first in appreciating the motivation of the 
Heine-Borel theorem; it may be best that you take up the study of the theorem 
when you need it in reading parts of Chapter 17. But since the theorem itself is a 
theorem of point-set theory, we put the exposition of it here in Chapter 16. 

Before stating the theorem we make a definition and consider some illus- 
trations to motivate the theorem. 
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Definition. Let S be a point set, and suppose we have a collection of a certain 
number of open sets such that each point of S belongs to at least one of the open 
sets. Then we say that S is covered by the collection of open sets. 

The number of open sets in the collection may be either finite or infinite. 

Example 1. Let S be the set of all points on the x-axis such that 0<x ^ 1. 
Let the collection of open sets be the sequence of open intervals 

(2’ 2), (v 1 ), (!’ I)> (A’ !),*••» 

the nth interval being I n = (1/2”, (n + 2)/2”). These intervals are shown schema- 
tically in Fig. 166. The set S is covered by this collection, for every x such that 
0<x ^ 1 lies in I n for some value of n. 



Fig. 166. 


Example 2. Let S be the set of points 0,ii . . . , l/(2n), ... on the x-axis. 
Let the collection of open sets be the open intervals 

(~w w), (?’ 1 ), (5’ !)> (7’ 5), • • • ,( 2n + 1 2 n - 1 )’ 


Note that S is covered by the collection of intervals. 

Example 3. Let S be the set of all points on the x-axis such that 0<x ^2. 
Consider the function f(x ) = 1/x, and define a collection of open intervals as 
follows: Suppose 0 < e <5. If x 0 is in S, denote by I(x 0 ) the set of all x such that 
x > 0 and |/(x) — /(x 0 )| < e, that is, the set of all x > 0 such that 


Xo x Xo 


It is clear from Fig. 167 that I(x 0 ) is the open 
interval from xj to xg where xj and xg are found by 
solving the equations 

± = ±+ e ± = 1 _ € 

Xo x 0 ’ Xo x 0 

One easily finds 


*0 . 
1 + ex 0 


__ 
Xo — 


Xo 


1 - ex 0 



Fig. 167. 
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Clearly I(x 0 ) contains x 0 . Consequently, as x Q varies over S we get a 
collection of intervals I(x 0 ) which covers S. 

The Heine-Borel theorem is concerned with this question: If S is covered by 
a collection of open sets, under what conditions on S is it possible to choose a 
finite number of open sets from the collection in such a way that the new finite 
collection still covers S? Let us examine the preceding examples with this 
question in mind. 

In Example 1 no finite number of the open intervals will suffice to cover S, 
for the first n intervals leave uncovered all the points x of S for which 
0<x ^ 1/2". Example 2 is different. The point O is covered by the special open 
interval (— w, ^). This same interval also covers all but a finite number of the 
remaining points of S, namely, all points l/(2n) with l/(2n)<^, or n = 6 , 7, 
8 , . . . . The remaining points are i . . . , uj, and these are covered by the five 
intervals ( 3 , 1), (5, 3), . . . , On, 9). Hence six intervals suffice to cover S in Example 
2. In Example 3 no finite selection of the intervals I(x Q ) will suffice to cover S, 
for, of any finite number of the I(x 0 ), there will be one for which xb is furthest to 
the left, and this will leave all the x such that 0<x^xb uncovered. 

We now proceed directly to a formal statement and proof of the Heine-Borel 
theorem. 

THEOREM VII. Let S be a bounded and closed point set , and let S be covered 
by a collection of open sets. Then a finite number of open sets may be chosen 
from the collection in such a way that S is covered by the new finite 
collection. 

All our illustrative examples were given for point sets on a line. -However, 
the theorem holds equally for point sets on a line, in the plane, or in space. It is 
merely necessary to bear in mind the proper definitions of open and closed sets 
for each dimensionality. 

Proof of the theorem. We shall give the proof for point sets on a line. But 
the method of proof may be immediately applied in higher dimensions. 

The set S will lie on some closed interval a^x^b, since it is bounded. If S 
is a finite set there is no problem; we have only to choose one open set to cover 
each point of S, and this will be a finite collection. Hence let us assume that S is 
an infinite point set. We make the assumption that no finite number of the open 
sets of the given collection will suffice to cover S. From this assumption we shall 
deduce a contradiction, and thus the theorem will be proved. 

We bisect the interval [a, b] and consider the parts of S lying in each of the 
two closed subintervals. If each of these parts could be covered by a finite 
subcollection of the open sets, so could S itself. Therefore, for at least one of 
the subintervals the corresponding part of S cannot be covered by a finite 
number of the given open sets. We bisect this interval and proceed as before. 
Each time we bisect, an infinite number of points of S must belong in the 
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subinterval which is retained, for otherwise this part of S could surely be 
covered by a finite number of the open sets. The repeated bisection process 
gives us a nest of closed intervals, and there is a unique point x 0 common to all 
the intervals of the nest (Theorem VI, §2.8). This point x 0 is clearly an 
accumulation point of S. It must therefore belong to S (Theorem I, §16.2), 
because S is closed. Since x 0 is in S, there is some open set of the collection 
which contains x 0 . This open set must therefore contain all except a finite 
number of the closed intervals of the nest which is shrinking down on x 0 . This 
brings us to a contradiction, for each interval of the nest has the property that 
the part of S in it cannot be covered by a finite number of the given open sets. 
As explained earlier, our arrival at this contradiction completes the proof. 

EXERCISE 

Let S be the set 1 ^ x ^ 2, and define the open intervals J(jc 0 ) as in Example 3. Show 
that S can be covered with a finite number N of these intervals, where N is the smallest 
integer exceeding 1/e. SUGGESTION: Consider the intervals J(jc 0 ) for x 0 = 1, 1 + e, 
1 + 2e, . . . . 



17 / FUNDAMENTAL 
THEOREMS ON 
CONTINUOUS 
FUNCTIONS 


177 PURPOSE OF THE CHAPTER 

In Chapter 3, we proved several important theorems about continuous functions, 
confining ourselves always to the case of a single real variable. Analogues of 
some of these theorems for functions of two variables were considered in §5.3, 
but no proofs were given. With the aid of the concepts and theorems developed 
in Chapter 16 we are now in a position to make a deeper study of continuity. 

One important new concept will be introduced in this chapter: the concept of 
uniform continuity. This is needed for the theory of integration, in Chapter 18. 


17.1 / CONTINUITY AND SEQUENTIAL LIMITS 

For the definition of continuity of a function of one variable, we refer the 
student to §3. There is an alternative way of expressing continuity, using the 
notion of a convergent sequence. We express this as a theorem. 

THEOREM I. Suppose f is a function of x, defined on some interval I. Let Xo be 
a point of I. Then f is continuous at xo if and only if lim n ^ /(x„) = /(x 0 ) for 
every sequence {x n } which has terms that belong to I and which is such that 
lim„ x n = x 0 . 

We shall not prove this theorem, because we are presently going to prove a 
more general theorem of which Theorem I is a special case. 

In dealing with continuity it is quite worth while to arrange matters so that 
what we say applies just as well to functions of two or three variables as to 
functions of one variable. The terminology of point-set theory enables us to do 
this. In particular, the concept of continuity can be defined in terms of the 
concept of neighborhood, without explicit mention of e and 8 or use of 
inequalities. 

Definition. Let S be a point set , and let f be a function defined at each point of S . 
If P 0 is a certain point of S, the function is said to be continuous at P 0 provided 
that to each neighborhood V of the value f(P 0 ) there corresponds a neighborhood 
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U of the point P 0 such that the value f(P) lies in the neighborhood V whenever 
the point P lies in the set S and also in the neighborhood U. 

This definition is equivalent to the previously given definitions (in §3 and 
§5.3), though it is somewhat different in form from these earlier definitions. For 
example, if / is a function of one real variable, we may take the set S to be on 
the x-axis, and denote the point by x instead of P. The neighborhood V of /(x 0 ) 
will be an interval defined by |y -/(x 0 )'| <€, or /(x 0 )- € < y </(x 0 ) + e, where e is 
some positive number. The neighborhood U will be an interval defined by 
|x-Xo|<5, or x 0 -6<x<x 0 + 5, where 8 is some positive number. The state- 
ment that /(x) is in V if x is in U is then the same as the statement that 
1/ (x) ~ / (x 0 )| < € if |x - x 0 | < 8. 

One thing which deserves to be emphasized is that the point set S on which f 
is defined is not subject to any restrictions. Thus, if we are talking about a 
function of two variables, and P has co-ordinates (x, y), the point set S can be 
any kind of point set in the plane. It does not need to be a region (as defined in 
§5.1). For example, S could be the set of points on the parabola y - x 2 , and 
f(x, y) might be the radius of curvature at the point (x, y). Or, in three dimen- 
sions, S might be the surface of an ellipsoid, and /(x, y, z) might be the distance 
from (x, y, z) to the origin. 

We now come to the generalization of Theorem I. 


THEOREM II. Suppose f is a function defined at each point of the set S. If P 0 is 
a particular point of S, the function is continuous at P 0 if and only if Iim n ^oc 
f(P n ) = /(P«) whenever {P n } is a sequence of points belonging to S such that 
lim n _oc P n = Po- 

Proof. Suppose f is continuous at P 0 , and suppose P n belongs to S and P n 
converges to P 0 . If V is any neighborhood of /(Po), we have to show that /(P„) is 
in V when n is sufficiently large. Now, since / is continuous at P 0 , there is a 
neighborhood U of P 0 such that f(P) is in V if P is in S and U. The fact that P n 
converges to Po means that P n is in 17 if n is sufficiently large. But then /(P„) 
must be in V, and one part of the proof is complete. 

We now assume that f(P n ) converges to f(P 0 ) whenever {P„} is a sequence in 
S converging to Po- We shall make the required proof by supposing that / is not 
continuous at P 0 , and deducing a contradiction. The denial of continuity at P 0 
may be phrased in this way: There is some neighborhood V of /(Po) such that, 
no matter what neighborhood U of P 0 is selected, some point P in U has the 
property that P is in S but /(P) is not in V. Accordingly, let us choose a 
succession of neighborhoods U u U 2 , * . . , U n , . . . closing down on Po (e.g., U n 
consisting of all points at distance less than 1 In from Po), and let P n be a point in 
U n with properties as described above. Then P n must converge to Po, but f(P n ) 
does not converge to /(P 0 ), because f(P n ) is always outside the neighborhood V. 
This contradicts our initial assumption. Therefore the proof is complete. 
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17.2 / THE BOUNDEDNESS THEOREM 

We give a generalization of Theorem II, §3.1, and Theorem I, §5.3. 

THEOREM III. Let S be a bounded and closed point set , and let f be a function 
defined on S which is continuous at each point of S. Then the values of f are 
bounded , or, as we customarily say , / is bounded on S. 

The proof is rather simple. Suppose the values of / were not bounded. Then 
no finite interval of the real number scale contains all the values, and for each 
positive integer n there must be at least one point P n in S such that |/(P„)| > n. 
The sequence {P n } is bounded, and must therefore contain a convergent sub- 
sequence (see Theorem V, §16.31, and the remarks about it in §16.41). Denote 
the subsequence by {P„.} and its limit by Q. Since S is closed, Q belongs to S. 
Then, since f is continuous at Q, /(P„. ) converges to /(Q), by Theorem II, §17.1. 
This is in contradiction to |/(P n .)| > n f , which has the consequence that |/(P„.)|-» 
oo as i-»oo. To avoid the contradiction we are forced to conclude that f is 
bounded. 


17.3 / THE EXTREME-VALUE THEOREM 

We give a generalization of Theorem III, §3.2, and Theorem II, §5.3. 

THEOREM IV. Let S be a non-empty bounded and closed set, and suppose f is 
a function defined on S and continuous at each point of S. Let m and M be 
the greatest lower bound and least upper bound , respectively, of the values of 
f on S. Then there is some point of S at which f has the value M, and there is 
also a point at which f has the value m. 

The existence of m and M is guaranteed by the theorem in §17.2 and 
Theorem II, §2.7. 

We begin the proof by observing that since M is the least upper bound of 
the values /(P), there must exist a sequence {P„} in S such that /(P„) converges 
to M. The sequence { P n } is bounded, and just as in the proof of Theorem III we 
obtain a subsequence {P n .} converging to a point Q, with f(P ni ) converging to 
f(Q). But f(P n ) also converges to M, so M = /(Q). This proves the theorem as 
regards M. The proof for m is essentially the same. 


17.4 / UNIFORM CONTINUITY 

When we say that a function is continuous at a certain point, this describes a 
certain relation between the values of the function and the values of the 
independent variable near a particular value of the independent variable. The 
new concept of uniform continuity, which we are concerned with in this section, 
has to do with continuity of a function at many different points. The word 
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“uniform” is used because, when we say that a function is uniformly continuous 
on a certain point set, this means that there is a certain quality about the 
continuity which is the same at all the points of the set. Before giving a formal 
definition of uniform continuity we shall have to see in what sense there can be 
recognizable differences in the continuity of a function at different points. 

The type of difference we have in mind has to do with the amount by which 
the independent variable may be allowed to change if the value of the function is 
not permitted to change more than a specified amount. 

Example 1. Consider the function /(x ) = x 2 . Suppose we start with some 
particular value x 0 , and ask: How much may x differ from x 0 if /(x) is required to 
differ from /(x 0 ) by less than 2 units? The answer depends on the value of x 0 . If 
x 0 = 0, the requirement on x is that x 2 <2, or |x| < V2, so that x must differ from 
jc 0 by less than V2._But if x 0 =8 the^ requirement is that |jc 2 — 64| < 2, or 
62<x 2 <66. Now V66 = 8.124 ... , V62 = 7.874 ... , so that the permissible 
difference between x and 8 is less than V66 - 8 = 0.124 .... The student will see 
at once from a graph of y = x 2 that the larger we make x 0 , the smaller is the 
amount by which x may differ from x 0 if x 2 is to differ from xl by less than 2 
units. 

If instead of 2 units we specify e units, where e > 0, the situation is 
essentially the same. If we want to find a number 8 so that |x 2 — xo| < e whenever 
|jc - x 0 | < 5 the value of 8 will depend on x 0 as well as on e, and with e fixed, 8 
must be made smaller and smaller as x 0 gets larger. There is no single positive 
value of 8 which is small enough to serve simultaneously for all values of x 0 . 
This illustrates the opposite of uniform continuity. 

Definition . Let S be a point set on the x-axis, forming part or all of the set of 
values of x for which f(x ) is defined. Then f is said to be uniformly continuous on 
S provided that to each e > 0 there corresponds a 8 > 0 such that |/(x) -/(x 0 )| < e 
whenever x and jc 0 are any points of S such that |x - x 0 | < 8. 

Let us carry the discussion of Example 1 further in the light of this 
definition. The function f(x) = x 2 is not uniformly continuous on the set S of all 
x. In fact if S is any set which is not bounded, the function is not uniformly 
continuous on S, for if there is no limit to how large x 0 can be, no single 8 can be 
found which meets the requirements of the definition. On the other hand, if S is 
a bounded set, the function is uniformly continuous on S. For instance, if S is 
the closed interval 0^x^8, we can take 8 = e/16 in the definition of uniform 
continuity. To see this, observe that if x and x 0 belong to S, 

|x 2 - x?| = |(x - x 0 )(x + Xo)| ^ 16|x - x 0 | 

and so 

|x 2 -Xo|<e if |x-x 0 |<y^’ 

Example 2. The function /(x) = 1/x is not uniformly continuous on the set S 
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defined by 0 < x ^ 1, but it is uniformly continuous on the set S defined by x ^ 1. 
The first assertion follows from the fact that if x 0 > 0 and 8 is chosen so that 


l 

x 


1 

*o 


< e 


if 


\x - X 0 | < S, 


the value of 8 must approach 0 as x 0 ^0. On the other hand, for the set S 
defined by x ^ 1 we can take 8 = e, because if x and x 0 belong to S and 
|x — Xo| < e we have 


i_± 

X Xo 


1 * 0 - x\ 

XXo 


^ |x - Xol < 6. 


The essential theorem about uniform continuity will now be given. 


THEOREM V. Suppose S is a closed and bounded point set , and suppose the 
function f is defined and continuous at each point of S. Then f is uniformly 
continuous on S. 


Proof. We make use of the Heine-Borel theorem. Suppose e > 0. If x' is any 
point of S, the definition of continuity assures us that there is some positive 
number h such that |/(x) - /(x')| < e/2 if x and x' belong to S and |x - x'| < h. The 
size of h will usually vary as x' is varied. Now consider the open interval 
x'-(h/2)<x <x' + (h/2). When x' varies over S, the collection of all these open 
intervals covers the set S. By the Heine-Borel theorem (§16.6) a finite number of 
these intervals suffice to cover S. Let the centers of these intervals be denoted 
by Xi, . . . , x n and let the corresponding values of h be hi, ... , h n . Choose 8 as 
the smallest of the numbers h i/2, . . . , hj 2. We shall show that this 8 will serve 
as required in the definition of uniform continuity. Suppose x and x 0 belong to S 
and |x — xo| < 5. Then Xo belongs to one of the finite set of open intervals, say the 
one with end points x f ± (hi/2), so that |xo~ Xi| < hj 2. Now 

|x - Xi I ^ |x - Xol + |x 0 - Xi I < 8 + y • 

But and so |x — Xj|<hj. The inequalities satisfied by |x 0 -x,| and |x-Xj| 

guarantee that 


|/(x 0 )-/(x,)|<| and |/(x)-/(x,)|<|- 

Therefore |/(x) -/(x 0 )| S |/(x) - /(x f )| + |/(x f ) - /(x 0 )| < e. 

This completes the proof. 

The definition of uniform continuity can be worded so as to apply to 
functions of more than one variable. It is merely necessary to write the condition 
involving € and 8 in the form “1/(P) - /(Po)| < e whenever P and P 0 are points of 
S such that d(P, P 0 )<5.” Here d(P, P 0 ) is the distance between P and P 0 . 
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Theorem V remains valid for functions of several variables, and the proof as 
given above can be adapted easily to the new situation by using distances in 
place of absolute values. 


17.5 / CONTINUITY OF SUMS, PRODUCTS, AND QUOTIENTS 

The process of addition may be considered as defining a function of two 
variables: 


s(x , y) = x + y. 

The same may be said of multiplication and division: 

p(x, y) = xy, q(x, y) = 

We use the letters s, p, q for these three functions because of the names 
sum, product, quotient. The following theorem states a fundamental fact about 
these functions: 


THEOREM VI. The functions s and p are continuous at all points (x, y). The 
function q is continuous at all points where it is defined (i.e., at all points 
except those for which y = 0). 

The proof of this theorem is very similar to the proof of Theorem XIV, 
§1.64, and on that account we omit the formal proof. Some suggestions for the 
student who wishes to work out the details are given in the following exercises. 


EXERCISES 

1. Show that |s(x, y) - s(x 0 , yo)| < € if (x, y) is in the square neighborhood of side e 
with center at (x 0 , yo), and so certainly if (x, y) is in the circular neighborhood of radius 
e/2 with center at (x 0 , yo). 

2. Let Af be a number larger than |x 0 | and |y 0 |. Let 8 be the smaller of the numbers 
M — |y 0 |, e/(2M). Show that |p(x, y) - p(x 0 , yo)| < e if |x - x 0 | < 8 and |y-y 0 |<8. Start 
from the fact that xy - x 0 y 0 = (x - x 0 )y + x 0 (y - yo) and so |xy-x 0 yo|^ 
l*-*o||y| + M|y-yo|. 

3. Study the proof of (1.64-4) and so show that \q(x, y) - q(x 0 , yo) | <e if \x - jc 0 | < 8 

lyol 2 * 


and |y-yo|<8, where 8 is the smaller of the numbers >o|, 
yo ^ 0. 


2(|x 0 | + |y«l) 


Assume that 


17.6 / PERSISTENCE OF SIGN 

We shall now give a generalization of Theorem V, §3.3. 

THEOREM VII. Suppose f is continuous at a point P 0 of the set S on which it is 
defined, and suppose f(P 0 ) t 6 0. Then there is some neighborhood of P 0 such 
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that at all points of S in this neighborhood the sign of f(P) is the same as the 
sign of /(Po). 

The truth of this theorem depends merely on the fact that f(P) is near f(P 0 ) 
if P is near P 0 . If f(P 0 ) s* 0, all values of f(P) sufficiently near f(P 0 ) will have the 
same sign as /(Po). 

17.7 / THE INTERMEDIATE-VALUE THEOREM 

An intermediate-value theorem was stated as Theorem IV, §3.3. We shall now 
consider how this theorem is to be generalized so as to apply to functions of 
more than one variable. The problem is of this nature: Suppose a function is 
defined on some point set, and is continuous at each point. If the function takes 
on the value 2 at one point and the value -3 at another point, does it also take on 
all values between -3 and 2? An example will show that such is not always true. 
For instance, let /( jc, y) = l/(xy). Then /( 2, = 2, /(-g, 2) = -3. But /(x, y) never 

takes on the value 0, and 0 is between —3 and 2, in spite of the fact that / is 
continuous at each point of the set where it is defined. The explanation of the 
difficulty lies in the fact that the points (2, J) and (-g, 2) are separated by the line 
x = 0, on which / is not defined. 

If / is defined and continuous on a set S which is not separated into several 
disconnected parts, it can be proved that / has the property of taking on all 
values between every pair of distinct values. In order to make this assertion in 
an exact form, and prove it, we must first make some definitions. 

Definition. Two point sets S\ and S 2 are called separated if the following three 
conditions hold: 

1. Neither set is empty. 

2. No point belongs to both sets. 

3. Neither set has a point of accumulation belonging to the other set. 

Example 1. The intervals - 1 < x < 0 and 0 < x < 1 are separated sets. The 
interior and exterior of the circle x 2 + y 2 = 1 are separated sets in the plane. The 
points on opposite sides of the plane z = 0 form two separated sets in space of 
three dimensions. 

Definition . A set S is called connected if it cannot be divided into two parts which 
are separated. 

Example 2. An interval on the x-axis, with or without either end point, is a 
connected set. The first quadrant in the xy-plane (x > 0, y > 0) is a connected set. 
The surface of a sphere is a connected set. 

If S is an open set which is connected, it may be shown that any two points 
of S can be joined by an arc which lies entirely in S. In fact, the arc may be 
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taken to consist of a finite number of line segments joined end to end. In §7.4 we 
defined an open set to be connected if it had this property. For open sets the two 
definitions are equivalent, but for sets which are not open the present definition 
is the proper one to use. 

We now come to the general intermediate-value theorem. 

THEOREM VIII. Suppose S is a connected set and that f is a function which is 
continuous at each point of S. Suppose f takes on two different values C i and 
C 2 at points P j and P 2 in S. Then , for every number k between C x and C 2 , 
there is some point of S at which f takes on the value k. 

Proof. Let us suppose the notation is such that C\ < k < C 2 . We shall 
suppose that for some particular k there is no point P such that f(P) = k. We 
shall show that with this assumption we can divide S into two separated parts. 
This contradiction will complete the proof of the theorem. 

Let Si be the set of those points of S for which /(P) <k, and S 2 the set for 
which k </(P). Observe that Pi is in Si, P 2 is in S 2 , and each point of S belongs 
either to Si or to S 2 . The sets Si and S 2 are separated. In fact, since conditions 
(1) and (2) in the definition of separated sets are clearly satisfied, we have only to 
verify the third condition. Now, it is impossible for a point of accumulation of S 2 
to belong to Si. For suppose Q were such a point. Then we could select a 
sequence {Q n } of points in S 2 such that Q„ converges to Q . Then f(Q„) converges 
to f(Q ), by continuity. But f(Q n ) > k, since Q„ is in S 2 . Therefore the limit f(Q) 
cannot be less than k. But f(Q) < k since Q was assumed to be in Si. This shows 
that no such point as Q can exist. The same argument shows that no point of 
accumulation of Si can belong to S 2 . This proves that Si and S 2 are separated, 
and the proof is complete. 
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18 / THE NATURE OF THE CHAPTER 

Heretofore, in considering integrals defined as limits of sums, we have taken it 
for granted that the limits do exist. The applications of integrals in calculating 
various geometrical and physical magnitudes are commonly of such a kind that 
we consider integrals of continuous functions, and it is plausible, from an 
intuitive standpoint, that the process of defining the integral of a continuous 
function does actually lead to a definite limit. 

For certain purposes it becomes desirable and necessary to consider the 
integration of functions which may be discontinuous. Consequently it is neces- 
sary to lay down the fundamental definitions about integrals in such a way that 
the theory can be developed without depending on hypotheses about continuity. 
In this chapter we shall lay the groundwork of a general theory of integration. 
The form of this theory goes back to Bernhard Riemann (1826-1866), a German 
mathematician. The first purpose of this chapter is to set forth clearly the 
concept of the Riemann integral. A function which can be integrated according 
to Riemann’s definition is called integrable. We show that continuous functions 
and certain kinds of discontinuous functions are integrable. 

In certain ways the theory of multiple integrals is much more complicated 
than the theory of integrals of functions of one variable. Our discussion of 
multiple integrals is designed mainly to lead up to an analytical treatment of the 
relation between a multiple integral and its evaluation by iterated integrals. This 
is a topic on which we have not aimed to give the greatest possible generality, 
but a reasonably simple and comprehensible treatment of the kind of multiple 
integrals which ordinarily arise in calculus. 

The chapter concludes with a very brief, conceptual, nontheoretical dis- 
cussion of Stieltjes integrals. 


18.1 /THE DEFINITION OF INTEGRABILITY 

Suppose that a, b are any real numbers such that a < b. We shall consider 
functions defined on the closed interval [a, b]. We shall not require the functions 
to be continuous, but we shall assume that each function is bounded. 

Now let us subdivide the interval into any number of subintervals. This is 
done by inserting points between a and b. Thus, suppose 


a = Xo < X\ < x 2 < * • * < x n j < x„ - b. 
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Then we have n subintervals 

[x 0 , x J, [xi, xj, . . . , [X„-i, x„]. 

We shall refer to such a subdivision as a partition of [a, b]. The partition 
determined by these particular points will be called the partition (x 0 , x b . . . , x n ). 

In the closed subinterval [Xj_i, x,] let mi and M, denote respectively the 
greatest lower bound and least upper bound of the values /(x). Then we form the 
two sums 

s = m,(x ] -x 0 )+ m 2 (x 2 -xi) + • • ■ + m n (x„ - x*^), (18.1-1) 

S = M,(xi - x 0 ) + M 2 (x 2 - xO + • * ■ + M„(x n “X B _i). (18.1-2) 

We call s a lower sum and S an upper sum. Observe that these sums are 
dependent upon the particular function / and the particular partition. 

Let us denote the greatest lower bound and least upper bound of /(x) on all 
of [a, b ] by m, M respectively. Then m ^ m, and M-, ^ M for i = 1, 2, . . . , n. Of 
course it is also true that m, ^ M,. We observe that 

(x I -x 0 ) + (x 2 “X l )+ • * • + (x n - x„_i) — b — a. 

Therefore, it is readily seen that 

m(b-a)^s^S^ M(b - a ). (18. 1-3) 

The next step in our procedure is motivated by geometry. Suppose for a 
moment that the function / is continuous and that all its values are positive. The 
lower sum s then represents the area of a sum of rectangles all of which lie 
between the x-axis and the curve y =f(x ) (see Fig. 168). It then seems plausible 
to suppose that if we consider all possible partitions and the corresponding lower 
sums, the least upper bound of all these lower sums s will be exactly the area 
between the x-axis and the curve and between x = a and x = b. It likewise seems 
plausible to suppose that this area is the greatest lower bound of all the upper 
sums S when all possible partitions are considered (again, see Fig. 168). 

Definition . As a result of the foregoing considerations , let us define 
I = least upper bound of all lower sums s, 

J = greatest lower bound of all upper sums S. 
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Then we shall say that the function f is integrable on [a, b ] if I = J, and in that case 
we shall call the common value of I and J the definite integral of f from x = a to 
x = b, and denote it by 

[ b f(x)dx. 

J a 

This definition applies to each function which is defined and bounded on [a, b]. 
For each such function we get unique values for I and J, but it may happen that 
It 6 J. It is only when I = J that the function is integrable. 

Example 1. Consider the function defined on [0, 1] by setting f(x) = 0 if x is 
rational and f(x) = 1 if x is irrational. For this function and this interval 1 = 0, 
J — 1, so that the function is not integrable. To see the truth of this assertion, 
note that any subinterval contains both rational and irrational values of x, and 
therefore that m, = 0, M, = 1. Consequently, from the definitions of s and S we 
see that s = 0, S = 1, no matter how the partition is chosen. Accordingly, 1 = 0, 
J = 1. 

The student will observe that the foregoing definition of f(x) dx is not the 
same as the definition given in §1.5. The definition which we are now considering 
makes no reference to the concept of a limit of approximating sums. Later on in 
this chapter we shall see that the two different forms of definition are equivalent. 
Our first concern, however, is to develop enough theory to furnish a practical 
means of telling when a function is integrable. 

LEMMA 1. Suppose we start with a certain partition , qnd then obtain a new 
partition by inserting some additional points. Let s , S refer to the lower and 
upper sums for the original partition , while s', S' are the sums for the new 
partition. Then s ^ s' and S' ^ S. 

For simplicity we shall prove this on the supposition that just one new point 
is inserted. Suppose for definiteness that the new point £ is between Xo and X[. 
Let 

M i = least upper bound of f (x) for Xo = x ^ 

MI = least upper bound of /(x) for £ ^ x ^ x\. 

Then certainly M Mi and M'{ ^ Mi. Therefore 

MU ~ xo) + M'l(x i -£)±i MU ~ xo) + MU\ ~~ £) = M,(x t - x 0 ). 

From this it follows that S’^S, because S' differs from S only in having 
MU ~ xo) + M'{(x i - O in place of Af i(xi - x 0 ). The proof that s ^ s' is similar. If 
more than one point is inserted we need only apply the argument several times. 

LEMMA II. For a given function and any two partitions , the lower sum for one 
partition is algebraically less than or equal to the upper sum for the 
other partition. 
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Proof. Denote the sums corresponding to the two partitions by s i, Si and s 2 , 
S 2 . Now consider the third partition which is obtained by considering simul- 
taneously all the points of subdivision of the first two partitions; denote the 
corresponding sums by s 3 , S 3 . We know by (18.1-3) that s 3 ^ S 3 . Also, we know 
by Lemma I that si^s 3 and S 3 ^ S 2 . Therefore, combining the inequalities, we 
see that si ^ S 2 . This is what Lemma II asserts. 

LEMMA III. It is always true that I ^ J. 

This is an immediate consequence of Lemma II and the definitions of I and J. 
We can now state an important criterion for integrability. 

THEOREM I. Suppose f is bounded on [a, b], and suppose that corresponding 
to each positive e there is a partition of [a, b ] such that the corresponding 
upper and lower sums satisfy the inequality S~ s <€. Then f is integrable. 

Proof. If the conditions as stated in the theorem are fulfilled, we have 
S < s + e. But J and 5 ^ J, by the definitions of I and J . Therefore, combining 
inequalities, we see that J^S<s + €^I + e, or J <I + e. Since this conclusion 
is valid for every positive €, we infer that J ^ I. But we also know that I ^ J 
(Lemma III). Therefore I = J. This means that / is integrable, by definition. 

Example 2. Suppose / is defined on [0, 2] as follows: 

/(x) = 1 if O^xcl, f(x) = 2 if l^x^2. 

We can show by Theorem I that / is integrable. 

For this purpose suppose a positive e is assigned. Let h be a positive 
number smaller than 1 and also smaller tl 
Consider the partition defined by 

x 0 = 0, Xi=l — h, x 2 =l + h, x 3 = 2. 

It is readily seen from Fig. 169 that 

M\= 1, m 2 = 1, M 2 = 2, m 3 =M 3 

and therefore that 

s = 1 • (1 - h) + 1 • 2h + 2(1 - h) = 3 - h, 

S = 1 • (1 - h) + 2 • 2h + 2(1 - h) = 3 + h. 

Consequently S — s = 2h <e. Therefore / is integrable, by Theorem I. In giving 
this argument we did not need to find the value of the integral, but it is not hard 
to see that 

f 2 f(x)dx = 3. (18.1—4) 

Jo 

Note that the function is discontinuous at x = 1. 
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The following theorem is the converse of Theorem I. 

THEOREM II. If f is integrable on [a, b], and if e> 0, there is a partition with 
upper and lower sums such that S - s < e. 

The proof is left as an exercise. 


EXERCISES 

1. Prove (18.1-4) in Example 2 by the following steps: First, I ^ 3; next, J ^3; and 
finally, I - J ~ 3. Explain each step fully. 

2. Suppose / is defined as follows: /(x) = 2 if 0^x<l, /(l) = 0, f(x) = - 1 if 

1 < x < 2, /( 2) = 3, f(x ) = 0 if 2 < x < 3, /(3) = 1 . (a) Prove that / is integrable, using an 

argument something like that in Example 2, but with six subintervals, (b) Find the value 
of Jo f(x) dx, using an argument like that of Exercise 1. 

3. Suppose / is defined by the requirement that f(x) = 2 if x is a rational number of 
the form p/2 q , where p can take on all the values 0, ±1, ±2, . . . and q can take on all the 
values 1,2,..., and f(x) = 1 for all other values of x. Calculate I and J for this function 
on the interval [0, 2], and thus prove that / is not integrable. 

4. Prove Theorem II. 

5. If / is integrable, so is the absolute- value function |/(x)|. Prove this by showing 

that if 5 , S refer to /, and s', S' refer to |/|, then S' - s' ^ S - s. Then use Theorem I. 

6. If / is integrable over the inverval [a, b], it is also integrable over any closed 

interval of [a, b]. Prove this, using Lemma I and Theorem I. 

7. Suppose a <b <c and that / is integrable over [a, b] and also over [b, c]. Prove 
by Theorems I and II that / is integrable over [a, c]. 


18.11 / THE INTEGRABILITY OF CONTINUOUS FUNCTIONS 

Every continuous function is integrable. We state this in a formal theorem. 

THEOREM III. If a function f is continuous at each point of [a, b], it is 
integrable on that interval. 

Proof. The argument hinges on Theorem I and on the uniform continuity of 
the function. Suppose e > 0. Choose 5 so that 

|/(x')-/(x")|<^ r ^ (18.11-1) 

if x' and x" are points of [a, b] such that |x'-x"|<5. This may be done, since f 
is uniformly continuous (Theorem V, §17.4). Now consider any partition 
(x 0 , Xi, ... , x„) such that all the subintervals have length less than 8, and let s, S 
be the corresponding lower and upper sums. The boundedness of / is guaranteed 
by Theorem II, §3.1. Now, in the interval [Xi_i, xj we can choose a point x' so 
that f(x') is as close as we like to and a point x" so that /(x") is as close as we 
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like to nij. Since (18.11-1) holds, we conclude that 


Mi - m, S 


2(b - a) 

But then, by (18.1-1) and (18.1-2), 

S-s =(M|- m,)(x, - x 0 ) + • • • + (M„ - m„)(x„ - x„-,), 


(18.11-2) 


and so 


|S ~ *1 = K*i ~ x 0 ) + ' ' ’ + (*» - x„-,)] = | 


Thus / is integrable, by Theorem I. 


EXERCISE 

If / is integrable on [a, b ], if / (x ) ^ 0 for each x, and if there is at least one point £ where / 
is continuous and /(£) > 0, prove that fa f(x ) dx > 0. 


18.12 / INTEGRABLE FUNCTIONS WITH DISCONTINUITIES 

A function such that f(x) never decreases as x increases can have points of 
discontinuity. Nevertheless, such a function is integrable over any closed 
interval on which it is defined. 

THEOREM IV. Suppose /( jc) is defined when a^x^b, and that /(*') = /0O if 
x' <x". Then f is integrable on [a, b]. 

Proof. There are two cases to consider. If f(a) = f(b), the hypothesis 
guarantees that f(x) is constant on the interval, and in this case certainly the 
upper and lower sums are both equal to f(a)(b - a), no matter what partition is 
chosen, and so the function is integrable. The other case is that in which 
f(a) <f(b). If e > 0 we can choose a partition in which all the subintervals are so 
short that the conditions 


- X|-i 


< f(b)-f(a)’ 


i = 1, - , n 


are satisfied. Now, the special hypothesis on / assures that m, =/(Xi-i) and 
Afi = f(Xi). That is, as x goes from **-[ to jc,-, /(jc) increases from its smallest to its 
largest value in the subinterval. Consequently 

S~s = t f(x i) - /(x 0 )](x, -*<>)+■■• + U(X n ) - /(x„-,)](x n - Xn- 1 ), 


S-s< {[/(*,) - /(xo)] + • • • + [/(x„) - /(x n -i)]} 

Because of cancellation of terms and the facts that x 0 — a, x n = b , this last 
inequality becomes simply S - s < e. We then conclude that / is integrable, by 
Theorem I. 
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It is of course also true that / is integrable if /(x) never increases as x 
increases. In this case m, = /(Xj) and M-, = /(x,^) on any subinterval [x,_i, x,]. 

It is beyond the scope of this book to make an intensive study of the 
question as to exactly what kinds of discontinuous functions are integrable. The 
function of Example 1, § 18.1, is not integrable, but it is discontinuous at every 
point of the interval [0, 1]. For many practical purposes it is sufficient to know 
that if / is bounded and has only a finite number of points of discontinuity on 
[a, b], then it is integrable. The discussion of the following example will illustrate 
a method by which the foregoing assertion may be proved. 

Example 1. Suppose / is defined on [0, 2] by 

f(x)~x if O^xCl, f(x) = x-l if 1 ^ x ^ 2. 

This function is bounded on [0, 2], and continuous except at x = 1. The graph is 
shown in Fig. 170. We shall use Theorem I to show that / is integrable. Suppose 
e > 0. We form a partition of [0, 2] by taking 1 - 
(e/4) and 1 + (e/4) as two consecutive points in the 
partition. The remaining points are taken between 0 
and 1 - (e/4) and 1 + (e/4) and 2, in a manner to be 
specified presently. Observe that / is continuous on 
the intervals [0,1 -(e/4)], [l + (e/4),2]; the point of 
discontinuity has been enclosed in the subinterval 
[1 — (e/4), 1 + (e/4)], whose length is e/2. Now consi- 
der the lower and upper sums s, S. Let s i represent the 

contributions coming to s from the subintervals of [0, 1 - (e/4)], and let S3 
represent the contributions from the subintervals of [l + (e/4), 2]. Let s 2 
represent the contribution from the single subinterval [1 - (e/4), 1 + (e/4)], so that 
s = S] + s 2 T s 3 . With similar notations for upper sums we have S = Si + S 2 + S 3 . 
Now, the least upper bound of /(x) when 1 - (e/4) ^ x ^ 1 + (e/4) is 1, and the 
least value is 0. Therefore 

’■-“(I)- 0 ’ s ’- >(§)-!■ 

Since / is continuous on [0, 1 - (e/4)] and [1 + (e/4), 2], we can choose the part of 
the partition in these intervals so that 

Si — si and S 3 -s 3 <|* 

This is by Theorems III and II. Then 

S-s = (S,-s,) + (S 2 -s 2 ) + (S3-s 3 )<J + | + |=€. 

We then conclude by Theorem I that the function is integrable on [0, 2]. 

The kind of argument employed in the foregoing discussion has a much more 
general application, and by means of it we can prove the following theorem. 
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THEOREM V. If f is bounded on [ a , b], and if the points of [a, b] at which f is 
discontinuous can be enclosed in a finite number of subintervals the sum of 
whose lengths can be made as small as we please, then f is integrable. In 
particular , f is integrable if it is bounded and has only a finite number of 
points of discontinuity. 

We omit the details of proof because of the similarity to the argument used 
in connection with Example 1. 

Example 2 . Suppose / is defined and bounded on [0, 1], and is continuous 
except at the points % i i 5 , . . .. Then it is integrable; for, no matter how small a 
positive e we choose, we can enclose all the points of discontinuity in a finite 
number of subintervals of total length less than e. To see that this is so, suppose 
0 < e < 1. As a first subinterval choose [0, e/4]. This interval will enclose all but a 
finite number of the points % 3 , . . .. If there are N points not so enclosed, we can 
enclose each of these remaining points in a subinterval of length e/(2N). This 
gives us N + 1 subintervals in all, the sum of whose lengths is 3e/4, and they 
enclose all the points at which / is discontinuous. Thus f is integrable, by 
Theorem V. 

The condition for integrability in Theorem V is sufficient, but not necessary. 
The following theorem is true, but we shall not attempt to prove it. 

THEOREM VI. If f is bounded on [a, b], it is integrable if and only if the points 
at which f is discontinuous can be enclosed in a finite or denumerably infinite 
set of subintervals of total length as small as we please. 

18.2 /THE INTEGRAL AS A LIMIT OF SUMS 

In this section we shall show the equivalence between the definition of the 
integral as given in §18.1 and the definition which is used in elementary calculus 
(the one given in § 1.5). 

It is desirable to introduce some special terminology and notation in con- 
nection with partitions. If [a, b] is a fixed closed interval, let us use symbols such 
as P, P 0 , P', ... for various partitions of the interval. If P is the partition 
determined by points (jc 0 , x b . . . , x„), where a = xo,b = x n , the length of the longest 
subinterval in the partition is called the mesh fineness of P, and is denoted in 
symbols by |P|. Thus, by definition, \P\ is the maximum of the differences 

Xi Xq, X 2 Xu . . . , Xn Xn — 1» 

The notion of the integral as a limit of sum is based on consideration of sums 
of the type 

/(xi)(x,-Xo) + /(xJ)(x 2 -x,)+* * • +/(*;)(*„ -Xn-0. (18.2-1) 

Here x 0 , Xi, . . . , x n are the points of a partition P, and x[, . . . , x' n are any points 
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chosen so that X;_i ^ x^x*, i = 1, . . . , n. The following theorem is fundamental 
in the theory of integration: 

THEOREM VII. Suppose f is bounded on the interval [a, b]. Then it is integrable 
if and only if the sums (18.2-1) approach a limit as the mesh fineness |P| 
approaches 0. This limit is then the same as the integral defined in §18.1. 

In order to prove Theorem VII it is best to begin by proving the following 
theorem, usually named after the French mathematician J. G. Darboux (1842- 
1917). 

THEOREM VIII. Suppose f is bounded on [a, b], and let s, S be the lower and 
upper sums corresponding to a partition P. Then s approaches I and S 
approaches J as |P | — > 0. This means that for any e >0 there is some 5 >0 
such that 


\s — I|<e and \S — J\<e if |P|<8. 

If we grant the truth of Darboux’s theorem, it is rather easy to prove Theorem 
VII. Let us suppose that / is integrable. Now, if x,_i x* ^ x f , we certainly have 
m, ^ fix'd = Mi and therefore 

s s 2 f(x')(x, - X,-,) S S. (18.2-2) 

;=i 

As |P| — > 0, Darboux’s theorem asserts that s and S approach I and J respec- 
tively. But / = J = fa fix) dx, and so we see by (18.2-2) that the sums (18.2-1) 
must approach fa fix) dx as |P| -> 0. On the other hand, if we assume that the sums 
(18.2-1) approach some limit A, this means that all such sums lie between A — e 
and A + e if |P| is sufficiently small. But, if we choose such a partition and keep 
it fixed, then by varying the choice of xi, . . . , x' n we can bring the sum (18.2-1) as 
close as we please to either s or S. Consequently we must have 

A — e ^ s and S ^ A + e. 

But then S - s ^ 2e. Since e can be chosen as small as we please, we know by 
Theorem I that / is integrable. This concludes the proof of Theorem VII. 

We still have to prove Theorem VIII. This proof is a bit intricate in detail. 
Let us first establish the following fact: If S is the upper sum corresponding to a 
partition P, and if S' is the new upper sum corresponding to a partition P' 
obtained from P by inserting a single additional point, then 

S — S' ^ 2C|P|, (18.2-3) 

where C is the least upper bound of |/(x)| on fa, b]. To see this let us suppose 
for definiteness that the new point £ is between x 0 and x u and use the notation as 
in the proof of Lemma I, §18.1. Then 

S - S' = M,(x, - x 0 ) - MU - *o) - M'Hxx ~ £). 
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But since each of the M’s is in absolute value less than or equal to C, we 
certainly have the inequality 

S — S' = C(x i - x 0 ) + C(f - x 0 ) + C(X] - £) - 2C(xi - x 0 ), 

and so (18.2-3) is true, for |xi — Xo| = \P\- By an extension of this argument we 
see that 

S-S'^2NC\P\ (18.2-4) 

if P' is obtained from P by inserting N additional points, not more than one in 
any one of the original subintervals of P. 

Now we are ready to prove Darboux’s theorem. Suppose e>0. Choose a 
fixed partition P 0 such that So < J + (e/2). This can be done, since J is the 
greatest lower bound of all possible upper sums. Suppose that h is the length 
of the shortest subinterval in P 0 , and suppose the points forming P 0 are 
fo, fi, , Sn+i (with So = a, £n+i = b). Choose 8 so small that 8 < h and 2 NC8 < 
e/2. Consider any partition P such that |P| < 8. Let P' be the partition formed by 
inserting the points . . . , £n along with the points forming P. No more than 
one of the £’s can fall into the same subinterval of P, so (18.2-4) will hold. Also, 
S' ^ S 0 , by Lemma I. Thus 

S - J = (S - S') + (S' - J) = (S - S') + (So - J) 

< 2NC|P| + !<! + !=€ 

by the way things have been arranged. We know, moreover, that O^S-J. 
Therefore |S-J| <e if |P|<8. This proves the part of Darboux’s theorem 
referring to S and J. A similar argument can be given for s and I , but there is a 
device for deducing the case of s and I from the result already proved for S and 
J ; see Exercise 3. 

EXERCISES 

1. Suppose / and g are integrable on [a, b]. Prove that the functions cf and / + g are 
integrable on [a, b], and that 

| c/(x) dx = c j f(x ) dx, 

j U(x) + g(x)]dx = J f(x)dx + J° g(x)dx. 

Use Theorem VII arid the basic theorems about limits, as stated in §1.6. 

2. If / is integrable on [a, c ] and a <b < c, prove that 

f /(x) dx= f f(x) dx + f f(x) dx. 

Ja Ja Jb 

Use Theorem VII and the result of Exercise 6, §18.1. 

3. Let us write s(/), /(/) to denote the dependence of s and I on the function /, and 
use similar notations for S, J . Show that s(f ) = — S(— /) and 1(f) = — J(—f ). Now deduce 
the part of Theorem VIII dealing with s and I from the part dealing with S and J. 
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18.21 / DUHAMEL S PRINCIPLE 

It sometimes happens that we need to consider sums which resemble those 
occurring in the expression of an integrable as a limit of sums but which do not 
have quite the right form to enable us to apply Theorem VII. It is useful to have 
a theorem which enables us to express the limit of certain sums as an integral, 
even though the situation is not one to which Theorem VII is applicable. We shall 
state such a theorem and discuss a few of its typical applications. 

First let us introduce the concept of what we shall call a law of partition 
weighting . By this we mean a rule whereby to each partition P corresponds a set 
of numbers, one number being assigned as a “weight” to each subinterval of the 
partition. Thus, if P is the partition (x 0 , Xi, . . . , x„) of an interval [a, b], the law 
of weighting will assign weights <f>\ to the respective subintervals 
[x 0 , xj, . . . , [x„_i, x„]. As an example of a law of partition weighting, suppose / is 
a function defined on [a, b], and let the weights <f> i, . . . , be the values 
/(xi), . . . , /(xi), where xi is chosen in [x,_i, x,] according to some kind of rule. In 
connection with a law of partition weighting we are going to consider sums of 
the form 


<t> l(Xi - Xo) + <t> 2 (x 2 - Xi) + • • * + 4>n(Xn ~ X„-i). 

We shall call these weighted partition sums. Our general purpose is to describe 
certain conditions under which these partition sums will approach an integral as 
a limit when the mesh fineness |P| approaches 0. In the applications of this sort 
of thing, if we have a certain law of partition weighting, this law is likely to be of 
such a sort that it appears to be very nearly the same as a law in which the 
weights are of the type /(xi), . . . ,/(x„) derived from some function /. This will 
suggest that the limit of the partition sums is the integral faf(x) dx. The problem 
is to justify this surmise by sound reasoning. The key lies in considering the size 
of the differences <£i~/(xi), . . . , 4> n -f(xh). 


THEOREM IX. Suppose we have a certain law of partition weighting, <f>i , . . . , <f> n 
corresponding to the subintervals of the partition P : (x 0 , Xj, . . . , x n ). Suppose 
also that f is a function integrable on [a, b ], and suppose that there is some 
choice of points xi, . . . ,x' n in each partition such that the maximum of the 
absolute values 

| i / (x i) | , . . . , \(f> n /(Xn)| (18.21-1) 

approaches 0 as |P|-»0. Then 

lim 2 4>i(Xi ~ Xi— i) = [ f(x) dx. (18.21-2) 

|P|-*)i = l Ja 

Proof. Suppose e > 0 and choose 8 so small that the maximum of the 
absolute values in (18.21-1) is less than € if |P|<& Our hypothesis means that 
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such a choice of 8 is possible for each positive e. Then |P| < 8 implies 

2 - X;,) S 2 l<#>i “ /(xD|(Xj - Xi-|) 

‘ 1 = 1 

fl 

< «2 (*i ~ *i-i) = e (6 - a); 


i= 1 


i=l 


in other words, 


lim 2 [<fr “ /09K*. - x f -i) = 0. 

|PM> /=! 


But this is the same as the assertion 

n n 

lim 2 “ */- 1 ) = lim 2 /(*!)(*« - 

|P|^0i = l |PhO i = l 

and this is equivalent to (18.21-2), by Theorem VII. 


We shall refer to Theorem IX as Duhamel’s principle. Duhamel dealt with 
the problem of finding limits of sums of the same general type as those we have 
called partition sums associated with a law of partition weighting. In Duhamel’s 
time and for long afterward in calculus textbooks the treatment of such matters 
was regularly couched in the phraseology of “infinitesimals.” The treatment 
given here, avoiding the term “infinitesimal,” is patterned after a formulation 
given by W. F. Osgood. 

Example 1. Suppose that / and g are continuous functions defined on [a, b]. 
Let F(x) = f(x)g(x). Consider a law of partition weighting which arises by 
choosing two points x\, x'[ in each subinterval of an arbitrary partition 
(x 0 , * n ), and letting <j)i = /(x-)g(x'O. Then the limit of the corresponding 

partition sums is the integral of the function F. That is, 

lim 2 f(xdg(x'!)(x, - jc-0 = f f(x)g(x) dx. (18.21-4) 

|PM> i=1 J a 

This is an application of Duhamel's principle. To see that it is a valid 
application we have to verify that the maximum of the quantities 

\f(xdg(x'!)~ FUD| (i = 1, . . ■ , n) (18.21-5) 

approaches 0 as |P|-»0. The truth of (18.21^4) will then follow, for F is 
continuous, and therefore integrable. We make use of the fact that / is bounded 
and g is uniformly continuous. Let A be the maximum of |/(x)|. The uniform 
continuity of g assures us that the maximum of the differences 

|g(x'0-gW)| (i = 1 «) 

approaches 0 as |P|->0, for \x\- x'-)| ^ |P|. But, for the expression in (18.21-5) 
we can write 

|/(xi)[g(x'0 - g(x'i)]| S A|g(x'0 - g(x',)|. 

Since A is fixed, the assertion about (18.21-5) is seen to be true. 
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Formula (18.21-4) holds true under less restrictive assumptions on / and g; 
see Exercise 3. 

Formula (18.21-4) can be extended in an obvious way to the case of 
products of three or more functions. 

Example 2. Consider the derivation of the integral formula for arc length of a 
curve, as discussed in §14.2. Let C be a smooth arc defined by 

y = g(t), z = h(t), a^t^b, 

where /, g, h have continuous first derivatives. If (f 0 , is a partition of 

[a, b], the discussion in §14.2 shows that the arc length of C is the limit 

Hm ± {[/'(«,)] 2 + ^'(/3,)] 2 +[h’( 7i )] 2 } l,2 Ar„ (18.21-6) 

|P|-*0 i=l 

where Afi = U - t and a„ ft, y * are certain points of the interval [fi- 1 , f,]. Now if 
we set 

4n = {[/'(«,)] 2 + [g'O ,)] 2 + [h'(7i)] 2 } 1/2 , 

we have a law of partition weighting. If it were true that a* = ft = y„ the limit of 
the partition sums would be the integral 

f {[/'(»)] 2 + [g’(t)l 2 + [h'(0!T 2 dt. (18.21-7) 

J a 

To show that the limit in (18.21-6) is equal to the integral in (18.21-7) we 
compare </>, with the corresponding expression in which ft and y, are replaced by 
a h and show that Theorem IX is applicable (with the necessary changes in 
notation). To do this we use the fact that the function 

{[/'(01 2 +[g'(«)] 2 +[h'(«)] 2 } ,/2 

is uniformly continuous in the cubical region 

a^kt^b, a^u^b, a ^ v ^ b. 

We omit the details. 


EXERCISES 

1. Suppose F is continuous on [a, b] and that / is defined and has a continuous 
derivative on [a, b]. Using the standard notation relating to partitions, find the limit 

lim £ F(xi)[/(x, )-/(*-,)] 

|PH« j = l 


where x\ is any point on the ith subinterval. 

2. Suppose / and g are continuous on [a, b], and F(u, v ) is a continuous function of 
u and v for all values of u and v obtained by setting u =/(x), v =g(y) and letting x, y 
vary over [a, b]. Use Theorem IX to show that 


lim 

jP|-0 


£ Fff(x'i), gwwx, -*-.)=£ 


F[f(x), g(x)] dx. 


Explain carefully where uniform continuity comes into the argument. 
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3. Show that (18.21^1) is true if it is assumed that / is bounded and that /(jt)g(x ) and 
g(x) are integrable. For the proof it is enough to show that 

lim 2 /(xi)[g(x'0 - g(x'i)](*i - x/-,) = 0. 

|P|-0 i = l 

But, if |/(x)| ^ A, and if S and s are upper and lower sums for g, show that the expression 
following the limit symbol above is not larger than A(S - 5 ). Then use a certain theorem 
employing the fact that g is integrable. 


18.3 /FURTHER DISCUSSION OF INTEGRALS 

In our theory of the Riemann integral we have been assuming that a < b. The 
cases of equal upper and lower limits and of lower limit larger than upper limit 
are handled exactly as in (1.52-2), (1.52-3). The way in which integrals over 
adjacent intervals are combined is indicated in Exercise 7, §18.1, and Exercise 2, 
§18.2. All these things are familiar from elementary calculus. 

Inequalities for integrals are used a great deal. The basic fact is that if 
f(x) ^ g(x) on fa, b] 9 then (assuming a<b) 

f f(x)dx^ f b g(x)dx . (18.3-1) 

J a J a 

In particular, since /(*) = |/(x)| and - |/(x)| ^ /(x), we have 

|| /(x)dx|sj j/(x)| dx. 

The inequality (18.3-1) is easily proved by analytical reasoning on either upper 
or lower sums. For the fact that [f | is integrable if / is integrable, see Exercise 5, 
§18.1. 

Finally, in this miscellany of remarks, it should be stated that the product of 
two integrable functions is integrable. This result can be proved in various ways. 
It is an immediate consequence of the rather advanced Theorem VI. It can be 
proved by more elementary means, however, though the arguments are rather 
ingenious. We shall not give the details. 


18.4 /THE INTEGRAL AS A FUNCTION OF THE UPPER LIMIT 

In this section we consider 


F(x) = [ X f(t)dt, (18.4-1) 

J a 

where a ^x ^ b and / is assumed to be integrable on [a, b]. 

THEOREM X. The function F is continuous. 

This is very easily proved. Consider any two points x\ x ", and suppose 
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x'<x n . Then 


Fix”) - F(x f ) = [ X f(t) dt - [ X fit) dt= f X fit) dt. 

Ja Ja J x’ 

Let C be the least upper bound of |f(t)| on [ a , b], Then 

|F(x") - F(x') | S J* |/(0| dt g C|x"- x'|. 

The continuity of F is a direct consequence of this inequality. 

In §1.52 we considered the same formula (18.4-1), but on the assumption 
that / was continuous at each point of [a, b]. It was then proved (Theorem VII 
of §1.52) that F'ix) = fix) at each point. In the present case we are not assuming 
the continuity of /, but merely its integrability. With this less restrictive assump- 
tion we cannot conclude quite as much as before. 

THEOREM XI. The formula F'ix) = fix) holds at each point where f is con- 
tinuous. 


Proof. Let jc 0 be a point at which / is continuous. If e > 0, choose 5 so that 
|/(x)-/(x 0 )|<e if |x - Xo| < 8. (18.4-2) 

We wish to show that 


um r ga±M z £ ai- /w ],o. 

ft->o In J 

Now 

rx Q +h rx 0 rx 0 +h 

F(x 0 +ft)-F(x 0 )= f(t) dt — I /(() dt = I f(t)dt, 

Ja Ja Jx 0 

and 


Therefore 


1 ft 0 +h 

f(x 0 ) = f(x o) dt. 

F(xotH)- F ( x o) _ f , \ .11 f^. 


(18.4-3) 


(18.4-4) 


We now suppose that 0<|M<8. Then the expression under the integral sign on 
the right side of (18.4-4) is in absolute value less than e, by (18.4-2), since 
|t -x 0 |5|h| <5. Consequently the whole right side of (18.4-4) is less than e 
when 0 < |h| < 5. This proves (18.4-3) and thus proves the theorem. 


EXERCISES 

1. Use Theorem X to prove that the value of fa fix ) dx is not changed if the function 
/ is altered merely by changing the value of fib). 

2. If / is integrate and G(x) = f? fit) dt, show that G is continuous by finding the 
relation between F and G. (Assume F defined by (18.4-1).) 
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3. Prove the result corresponding to that in Exercise 1 if / is altered merely by 
changing the value of /(a). 

4. Suppose / is defined on [0,1] as follows: f(x) = (-!)" if l/2 n+l < x ^ 1/2", n = 
0, 1, 2, . . . , and /(0) = 0. Find the value of Jo /(x) dx by using the result of Exercise 2. 

5. If F(x) = Jo V(1 — P)(2- t 2 ) dt, sketch the graph of y = F(x) when -l^x^l, 
using information derived from F'(x) and F"(x). The integral is an elliptic integral, and 
cannot be expressed in terms of elementary functions. 


6 . 


Show that -j- 
dx 



dt = -f(x) if f is continuous at x. 


7. Let F(x) = J-* t \ t\ dt. (a) Find F'(x) by Theorem XI. (b) Find explicit formulas 
for F(x) for the separate cases x <0, x > 0. (c) Draw the graph of y = F(x). 


8. Let J(x)= 1 if O^x^l, J(x) = - 1 if -l^x<0. Define F(x) by (18.4-1) with 
a=— 1. (a) Find separate explicit formulas for F(x) if — l^x^O and O^x^l. 

(b) Draw the graph of y = F(x) and check the validity of Theorems X, XI. What 
about F'(0)? 


18.41 / THE INTEGRAL OF A DERIVATIVE 

In this section we shall prove a generalization of Theorem VIII, §1.53. 

THEOREM XII. Suppose f is integrable on [a, b], and suppose there is a 
function F which is continuous on [a, b ] and such that , for each x on the 
open interval a < x < b, F has a derivative given by F'(x) = J(x). Then 

f f(x) dx = F{b) - F(a). (18.41-1) 

J a 

Proof. Consider any partition (x 0 , x,, . . . , x„) of [a, h ]. We apply the law of 
the mean (Theorem IV, §1.2) to F(x) on each of the subintervals [x_i, x,-], 
i = 1, . . . , n. There is some point x\ such that x^i <xl<Xj and 

F(x ( ) - F(xi-i) = (x,- - x,_,)F'(xi). 

Thus, noting that F'(xl) = f(x\), we have 

F(x0 — F(x 0 ) = (x, — xo)/(x|) 

F(x 2 ) - F(x0 = (x 2 - x,)/(x 2 ) 

F(x„) - F(x„-0 = (x„ - x n -t)f(x'„). 

Adding, and recalling that x 0 = a, x„ = b, we see that 

F(b)-F(a) = £ f(x'd(x i - Xj-i). (18.41-2) 

i = l 

This kind of relation holds for each partition. Hence by Theorem VII we see that 
(18.41-1) is true. 

The student will observe, on comparing the present theorem with Theorem 
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VIII, §1.53, that the conclusions in the two theorems are identical. In the earlier 
theorem we assumed that / was continuous, whereas here we have assumed 
merely that / is integrable. 

Example. Find the value of 

[ 2h ( .1 1 \ 

2x sin — cos — dx. 

Jo \ x x) 

Here 

f(x) = 2x sin — — cos — 

X x 

when We may define /( 0) in any manner we please, but / will be 

discontinuous at x = 0 in any case. However, f is continuous for other values of 
x, and bounded; therefore, it is integrable. To evaluate the integral we define 

F(x) = x 2 sin^ if x^O, F(0) = 0. 

The definition F(0) = 0 makes F continuous at x = 0. For x^O we readily verify 
that F'(x) = f(x ). We observe, incidentally, that F'(0) = 0; this may or may not 
be /( 0), depending on how we define /( 0). The conditions of Theorem XII are 
satisfied, and so 

C ( 2x Sin X - COS x ) dx = F (I ) " F(0) = P • 


18.5 / INTEGRALS DEPENDING ON A PARAMETER 


Our concern in this section is with integrals in which a parameter occurs under 
the integral sign or in the limits of integration, or in both places. The main 
objective is to learn about differentiation with respect to the parameter in such 
cases. We begin with the situation in which the parameter occurs solely in the 
integrand. Consider 


F(y) = [ b f(x,y)dx. 

J a 


(18.5-1) 


We shall suppose that / is defined when (x, y) is a point of the rectangle in the 
xy-plane defined by a^x^b, c^y^d. We denote this rectangle by R . 

It is important for us to know conditions on the function / that will 
guarantee the continuity of F. We do not attempt to give the most general (i.e., 
the least restrictive) conditions. For usefulness in practice the following theorem is 
convenient. 


THEOREM XIII. If f is continuous at each point of R , then F is continuous for 
each y on the interval [c, d]. 


The proof depends upon the fact that / is uniformly continuous in R: see 
Exercise 9. 
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The next theorem deals with finding the derivative of F(y). 


THEOREM XIV. Suppose that /(x, y) is an integrable function of x for each 

value of y, and that the partial derivative exists and is a continuous 

dy 

function of x and y in the rectangle R. Then F(y) has a derivative given by 


,f (y) = ( 

J a 


_ f h dfjx, y) 


dy 


dx. 


(18.5-2) 


The formula (18.5-2) is often called Leibniz’s rule. 


Proof. To give the proof let us use the notation / 2 (x, y) for the partial 
derivative with respect to y. Formula (18.5-2) is equivalent to the statement 


lim 

h-0 


F(y + h) — F(y) 


- f f 2 (x, y) dx] = 0. (18.5-3) 


Now 


F(y + h)-F(y)= f 

J a 


[fix,y + h)-fix, y)]dx. 


We apply the law of the mean to /(x, y) as a function of y: 
f(x, y + ft) -fix, y) = hf 2 ix, y + 0h). 

Here S is a number depending on x, y, h, and such that 0 < 0 < 1. From this last 
formula we see that 

— - f 2 (x, >’) dx = j [f 2 ix,y + eh)-f 2 ix,y)]dx. (18.5-4) 


At this stage we make use of the fact that f 2 is uniformly continuous in jR (by 
Theorem V, §17.4). Suppose e > 0. Choose 8 so that the values of / 2 at two 
different points of R differ by less than e if the distance between the points is less 
than 8. Then certainly 

IMx, y + eh)- f 2 ix, y)| < e 
if |h| < 8. Consequently, we see by (18.5-^t) that 

— - h fj — f 2 ix, y) <e(fo - a) 

if 0<|h|<5. since e can be as small as we please, this proves (18.5-3). 

Example 1. Find F'( y) if 

F'(y) = f log(x 2 + y 2 ) dx. 

Jo 

We can apply Theorem XIV with 0 ^ x ^ 1 and y on any closed interval not 
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containing y = 0. The result is 

F'(y) = f * -jj-i dx. 

Jo X + y 

The integration is readily performed, and we find 

F'(y) = 2tan-'(|). 

If the parameter occurs merely in the limits of integration we can use 
Theorem XI and the standard chain rule for differentiating composite functions. 
We illustrate by an example. 

Example 2 . Suppose 

F(y) = f Vl + X 3 dx. 

J sin y 

How is F'(y) found in this case? We set 

G(u, v) = [ V Vl+x 3 dx. 

Ju 

Then G(u, v) becomes F(y) if we put u = sin y, v = e y . Therefore 

dF _ dG_ du_ dG_ du 
dy du dy dv dy 

But, by Theorem XI, 

fJ--VTT7, ^-VTT?. 

du dv 

(How does the minus sign get into the first formula?) Therefore 
F'(y) = — V 1 + sin 3 y cos y + Vl + e 3y e y . 

If the parameter occurs both under the integral sign and in the limits of 
integration, we use both Theorem XI and Theorem XIV and the chain rule. 

Example 3. Suppose 

r y 2 

F(y) = x\y - x ) 1 dx. 

Jo 

Here we can define 

G(u, v)=[ x 5 (v - x ) 7 dx 
Jo 

and put u = y 2 , v = y. Then 
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Since = 2y, ^ = 1, the chain rule gives 


F'(y) = y'°(y - y 2 ) 7 • 2y + [ 7x 5 (y - xf dx. 

Jo 


With a little practice the student will be able to handle problems of this kind 
without explicit introduction of auxiliary variables. 

EXERCISES 

1. If F(u) = fi log(l - u 2 x 2 ) dx, find F\u ) by Leibniz’s rule. What is the value of 
F\ 0)? Show without performing any integrations that F(u ) has a maximum value at u = 0 
and that the graph of v = F(u) is concave downward in the uu-plane. 

~ , f 1 a dx „ , . 


2. If F(a) = 


l Vl - a 2 x 


:y find F'(a ) by two different methods. 


3. If F(x) = Jo* ‘ dt, find F'(x). 

4. If F(x) = / j x ( x 2 -t 2 ) n dt, find F'(x). 

5. Suppose <£(y) = Jo x n (y-x) m dx, where m and n are positive integers. Calculate 
</>'( y), (f>"( y), . . . without integration, and show that 

</> (m)( y) = ~~P y" +1 - 

n 1 

Then, observing that </>(0) = = ■ • • = </> (m-1) (0) = 0, integrate </> (m) ( y) successively m 

times and so arrive at the formula 

A(y)= - - y — 

9Ky) (m + n + l)! y 

6. If u — fx-ct <f>(s) ds, where (f> has a continuous derivative, show that = c 2 ^-^- 

dt dx 

f y f x d 2 u 

7. If u = dt f(s,t)ds, show that - — — =f(x, y), assuming that / is con- 

Jl/x Jut dx dy 

tinuous. Find the other second derivatives of u, assuming that / has continuous first 
partial derivatives. 

8. Suppose M(x, y) and N(x , y) are continuous in a rectangle with center at (a, b ), 

and that the partial derivatives and are continuous and equal in the rectangle. 

dy dx 

Show by Theorems XI, XII, and XIV that P~=M and P~=N if /(x, y) = 
J dx dy ' 

b)ds+tf N(x, t)dt. 

9. Prove Theorem XIII, using the uniform continuity of / in much the same way that 
the uniform continuity of f 2 was used in proving Theorem XIV. 


18.6 / RIEMANN DOUBLE INTEGRALS 

In this section we shall discuss the theory of Riemann double integrals 


lff(x, y) 


(18.6-1) 
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in a manner paralleling the theory for single integrals, as developed in the first 
part of this chapter. We shall be much briefer in the theory of double integrals, 
and many proofs will be omitted. 

It is simplest to begin with the case in which R is a rectangle with sides 
parallel to the co-ordinate axes. Suppose these sides are x = a, x = b, y = c, 
y = d, where a < b and c < d. If we form a partition of [a, b ] by points 
C*o, Xu . . . , x m ), and a partition of [c, d] by points (y 0 , yi, . . . , y„), the lines x = x * 
and y = y, form a rectangular partition of the rectangle R into rectangular cells. 
The numbers m, n need not be equal; the total number of cells in the partition of 
R is N = mn. 

Now suppose that / is a function which is defined and bounded in R . If the 
cells are numbered in any order, let mi, ... , m N be the greatest lower bounds of 
/(x, y) in the various cells, and let Mi, . . . , M,v denote the corresponding least 
upper bounds of /(x, y). If the areas of the cells are A A u . . . , A A N , we form the 
lower sum 

s ~ m i AAi + • • • + m N AA N 

and the upper sum 

S = M\ AAi + • • • + Miv AA n . 

Then we denote the least upper bound of all possible lower sums by I, and the 
greatest lower bound of all possible upper sums by J. 

Definition . If I = J, we say that f is integrable over R , and we define the double 
integral (18.6-1) as the common value of I and J. 

Starting from this definition we can obtain analogues of the lemmas and 
theorems in §18.1. It can be proved that if / is continuous in the closed rectangle 
R, it is integrable. The proof, using uniform continuity, is much like the proof of 
Theorem III, §18.11. 

A function may have certain points of discontinuity and yet be integrable. 
For example, if / is continuous in R except at certain points which lie on a finite 
number of smooth curves, it can be shown that / is integrable. An instance is 
furnished by the function defined by /(x, y) = 1 if x 2 +y 2 ^l, /(x, y) = 0 if 
x 2 + y 2 > 1 and -2^x^2, -2^y^2. This function is discontinuous at the 
points of the circle x 2 + y 2 = 1, but it is integrable over the square in which it is 
defined. 

It will be very useful in some later work to be able to deal with integrable 
functions having some points of discontinuity. On that account we shall discuss 
such matters a little further here. As an aid in this discussion we introduce the 
concept of the outer content of a set of points in the xy-plane. 

Definition. Let The a point set in the rectangle R. Consider any rectangular partition 
ofR. Select all those cells of the partition which contain points of T, and let A denote 
the sum of the areas of all these cells. If we consider all possible partitions , the 
greatest lower bound of the values of A is defined as the outer content of T. 
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It will be seen that if we define a function f such that f(x , u) = 1 if (x, y) is a 
point of T, and f(x , y) = 0 elsewhere in JR, then A is the upper sum S for this 
function, and consequently the number J is the outer content of T. 

It is also quite evident that the outer content of T depends on T alone, not 
on the choice of the rectangle R containing T. 

THEOREM XV. Let f be any function which is bounded in R, and let T be the set 

of points in R at which f is discontinuous. Then f is integrable provided the 

outer content of T is 0. 

This theorem is analogous to Theorem V, §18.12. We omit the proof. The 
condition of zero outer content is sufficient, but not necessary, for / to be 
integrable. 

It is easy to see that a straight line segment has zero outer content. Also, any 
smooth arc has zero outer content (see Exercise 5). The boundaries of regions of 
familiar shapes, such as squares, circles, ellipses, all have zero outer content. 

Next we turn to the definition of double integrals over regions which are 
not rectangles with sides parallel to the axes. We confine our attention to 
bounded regions with boundaries of outer content zero. For convenience we 
shall refer to regions of this type as Riemann regions. 

Suppose G is a Riemann region. Let R be a rectangle containing G and 
having its sides parallel to the axes. Suppose / is a function which is defined in 
G. We shall define a new function throughout R by setting g(x, y) = /(x, y) at 
points of G and g(x, y) = 0 at points of R not in G. We then say that / is 
integrable over G if g is integrable over R, and in that case we define 

j f f(x, y )dA = ff g(x, y) dA. (18.6-2) 

G R 

This procedure makes everything depend on the theory of integrals over 
rectangles. Alternative procedures are possible. We could define upper and 
lower sums directly for / and the region G. In that case it becomes necessary to 
decide what to do with cells of a partition in case the cells contain points in G as 
well as points not in G. The fact that the boundary of G has zero outer content 
makes.it immaterial whether such cells are ignored or not. 

Darboux’s theorem (Theorem VIII, §18.2) can be generalized to. the case of 
double integrals, and as a result we get the important fact that a double integral 
can be expressed as a limit of sums (the analogue of Theorem VII). Thus we 
make connection with the earlier definition of double integrals in §13.2. 

EXERCISES 

1. Suppose /(x, y) = tan -1 if x^ y, and define /(x, y) = 0 when x = y. Is / 

integrable over the square 0^x^l,0^y^l? 

2. Suppose /(x, y) = sin(l/xy) if xy^O, and define /(x, y)=l if x or y = 0. Is / 
integrable over the square -l^x^l, — l^y^l? 
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3. Suppose /(x, y) = 0 if x + y is rational, and /(x, y) = 1 if x + y is irrational. Is f 
integrable over the square 0^x^l,0^y^l? 

4. Suppose /(x) is a continuous function defined when a^x^b. Let T be the set of 
points on the graph of y = f(x) in the xy-plane. Is it possible for T to have positive outer 
content? Justify your answer. 

5. Let C be a smooth arc with parametric equations x = f(t), y = g(t), a^t^b. 

Divide [a, b] into n equal parts by points t 0 ,...,t n : let 5 = (b - a)ln and let (x,-, y f ) be the 
point of C corresponding to U. Choose M so that jf'(D| and |g'(0| do not exceed M on 
[a, b]. Use the law of the mean to show that the points of C for U~i satisfy the 

inequalities |x — x*| ^ MS, |y - y 4 | ^ MS, and hence that C is covered by n squares of total 
area 4 M 2 (b - a)§. What do you conclude about the outer content of C? 


18.61 / DOUBLE INTEGRALS AND ITERATED INTEGRALS 

In §13.3 we gave an account of the procedure for expressing a double integral 
as an iterated integral. The procedure is summed up in the two formulas (13.3-5), 
(13.3-6), and a formal statement about the first of these formulas was made in 
Theorem III, §13.3. However, we did not actually prove the theorem; we merely 
made an argument for its plausibility, by interpreting both the double integral 
and the iterated integral as expressions for a certain volume. The student will do 
well to read §13.3 as far as Example 1 before proceeding further with the present 
section. 

We now wish to give a strictly analytical proof of the relation between 
double integrals and iterated integrals. This proof is necessary if we are to have 
a firm logical justification for the procedures used in evaluating double integrals. 
The proof is, moreover, very instructive for the prospective student of more 
advanced mathematics, for it is fairly representative of a type of proof which is 
encountered in a variety of different forms in higher analysis. The basic principle 
is that of showing that certain limit processes of rather complicated nature can 
be replaced by two successive limit processes of simpler nature. 

We begin with the case of a double integral over a rectangular region. 

THEOREM XVI. Let R be the rectangle a^y^b, c^x^d, where a <b, c <d. 

Suppose that f(x , y) is defined and integrable over R. For each x suppose that 

/(x, y) is integrable with respect to y over [ a , b], and let the function 

</>(*)= [ b f(*,y)dy (18.61-1) 

J a 

be integrable over [c, d]. Then 

J J f(x, y) dA = J dx J f(x,y)dy. (18.61-2) 

R 

Proof. Let (x 0 , . . - , x m ) be a partition of [c, d] into m equal parts and 
(y 0 , . . . , y„) a partition of [a, b] into n equal parts. We then obtain a partition of 
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R into mn cells, each of the same area. Let Ax,- = x f — x*_i, Ay, = y, - y hr . Now, if 
we add up all the mn terms f(x» y ,) A x* Ay, where 1 ^ i ^ m and 1 ^ j ^ n, we get 
an approximation to the value of the double integral of /(x, y) over R , and we 
can make this approximation as close as we please by taking m and n sufficiently 
large. That is, if e > 0, there is some integer q depending on e such that 


2 2 /(*» yj) Axj Ay; 


i-t ;=! 


fffix,y)dA 


<€ 


(18.61-3) 


if q ^ m and q = n. 

Next, since /(x f , y) is an integrable function of y, we know by Theorem VII, 
§18.2, that for each i 


lim 2 f(Xi, yj) Ay, = [ f(x h y) dy. 

j= 1 J a 


It then follows from (18.61-3) that 


2 (J f(Xb y) dy) Ax, -jj f(x, y) dA 


^ 6 


(18.61-4) 


if q ^ m. Using the definition of <£(x) [see [18.61-1)], we can rewrite (18.61-4) as 


2 4>(Xi) Ax,- - JJ f(x, y) dA 


^ e. 


This means that 


lim 2 Ax, = [ [ fix, y) dA. 
oc i = l J J 

R 

But since (f> is integrable we know that this last limit is J <f>(x) dx. Therefore 

| 4>(x) dx = j j /(x, y) dA. 


This is the same as (18.61-2). 


There is, naturally, a corresponding theorem in which the roles of x and y 
are reversed. 

If /(x, y) is continuous in the rectangle, all the assumptions in Theorem XVI 
are satisfied, as we see by using Theorem III, §18.11, and Theorem XIII, §18.5. 

Let us now consider Theorem III, §13.3, in either of the forms (13.3-5), 
(13.3-6). It is assumed that / is continuous in the region under consideration (see 
Fig. 96 and Fig. 97). The boundaries of these regions have zero outer content, for 
they are made up of graphs of continuous functions, and it is easy to show that 
the graph of a continuous function, e.g., x = Xj(y), a ^ y ^ b, has zero outer 
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content (the proof depends on uniform continuity). Now let the region be 
enclosed in the smallest possible rectangle with sides parallel to the axes, and 
extend the definition of / by setting /(x, y) = 0 in the part of the rectangle outside 
the region R. This makes / integrable over the rectangle, and the conditions for 
applying Theorem XVI are satisfied. Thus the double integral over the rectangle 
is equal to the iterated integral in each of the two possible orders. But, since 
/ = 0 outside of the region R, the integrations need not be carried beyond the 
boundary of JR. Thus, for example, the integral with respect to x across the 
width of the rectangle becomes simply 



dx. 


In this way we see that (13.3-5) and (13.3-6) are both true. 


18.7 /TRIPLE INTEGRALS 

The theory of Riemann triple integrals may be developed in a manner closely 
analogous to the theory of double integrals. The ideas are not fundamentally 
different from those in § 18.6. In dealing with integrals over regions in three 
dimensions, it is assumed that the regions have boundaries of outer content zero. 
Regions of familiar shapes, e.g., spheres, cones, and cylinders, are of the required 
type. Likewise, the reduction of triple integrals to iterated integrals of lower order is 
handled in much the same way as the corresponding problem for double integrals in 
§ 18.61. We omit the details. 


18.8 / IMPROPER INTEGRALS 

In the Riemann theory of integration the functions are assumed to be bounded, 
and the intervals or regions of integration are assumed to be bounded. Under 
these conditions, if the function is integrable, the integral is said to be a proper 
integral. There are some extensions of the definitions of integrals. When the 
Riemann theory is taken as basic, as it is in this book, any integral whose 
definition does not come within the framework of the Riemann theory, but which 
is defined by a limiting process depending on the Riemann theory, will be called 
an improper integral. An integral may be improper because it is an integral over 
an unbounded interval or region; or it may be improper because it is the integral 
of an unbounded function. 

Example 1, The integral Jo 1 x~ m dx is improper because the function /(x) = 
x m is not bounded on [0, 1]. However, the integral f l c x~ m dx is proper for each 
c if 0 < c < 1, and we define 

f x~ m dx = lim f x~ l/2 dx. 

J 0 c-> 0 + J c 

The limit exists and has the value 2, as is easily verified. 
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f 00 dx 

Example 2. I — is improper, because the interval of integration is not 
J i x 


finite. We define 


rdx f b dx 

— hm I — 

J j X b-*x J 1 X 


if the limit exists, which is if and only if a > 1, as may be verified by the student. 
The value in that case is (a - 1) _1 . 

These two examples are instances of the definition of improper integrals as 
limits of proper integrals. Similar procedures can be used to define improper 
double and triple integrals. For a more thorough discussion of improper integrals 
see Chapter 22. 


18.9 / STIELTJES INTEGRALS 

The purpose of this section is to give a brief elementary introduction to the 
subject of Stieltjes integrals. These integrals are named after a Dutch mathema- 
tician, T. J. Stieltjes (1856-1894). They have long been used by mathematicians 
as a tool in theoretical investigations. The current tendency is in the direction of 
a wider usage, and the student of pure or applied mathematics will sometimes 
encounter the Stieltjes integral in his reading before he has had a chance to learn 
much about the integral. The common practice is to discuss the theory of the 
Stieltjes integral in courses in the theory of functions of a real variable, along 
with (or after) the study of functions of bounded variation. Our intent here is to 
break the ice much earlier, particularly for the student who may never take the 
courses just mentioned. Our discussion will necessarily deal only with the 
rudiments, and will stress definitions, basic properties, and illustrations. Proofs 
are omitted. In large part they are very similar to proofs in the theory of 
Riemann integrals. 

The Stieltjes integral involves two functions / and g, each defined on a 
closed interval [a, b]. It is denoted by 


f f(x)dg(x). 
J a 


(18.9-1) 


In the special case in which g is the simple function g(x) = x, the Stieltjes 
integral (18.9-1) becomes the Riemann integral 


f(x) dx. 


To define the Stieltjes integral (18.9-1) we start with a partition P of [a, b]: 
(x 0 , Xi, . . . , x n ) and a set of points xi, . . . , x' n , one in each subinterval of the 
partition. We then define 


[ f(x ) dg(x) = lim 2 /WHgUi) “ g(*i-i)L 

Ja |PbO/ = l 


(18.9-2) 
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provided the sums converge to a unique limit as |P|-*0. Here \P] denotes the 
mesh fineness of the partition P, as defined in §18.2. 

Example 1. Find the value of the integral if f(x) = 1 and g(x) = x 2 . The sum 
in this case is 

(jCi-JCo) + (x 2 -X])+ ' • ■ + (x 2 n- xl- x ) = x 2 n - xl= b 2 - a 2 . 

Therefore 

f f(x)dg(x) = b 2 -a 2 

J a 

in this case. 

This argument works just as well for any g(x) when f(x) = 1 for all x. Thus 
[ b l-dg(x) = g(b)-g(a), (18.9-3) 

J a 

no matter what kind of a function g is. In particular, g need not be continuous in 
(18.9-3). 

If g is constant on [a, b], all the differences g(jtj) - g(x,_i) are zero, and so 
we see that 

[ b f(x)dg(x) = 0 (18.9-4) 

J a 

when g is constant on [a, b]. This is true no matter what kind of function / is. 

One of the important practical uses of Stieltjes integrals involves the case in 
which g is a discontinuous function which has a finite number of discontinuities at 
which it jumps suddenly in value, but remains constant in value in the open 
intervals between the points of discontinuity. The simplest case is that in which 
the only discontinuities are at the endpoints a and b. 

Example 2. Suppose g(jc) = c if a <x<b , and let g(a ) and g(b) have any 
values whatever (see Fig. 171). In this case we can show that 


[ f(x)dg(x) = f(a)[c-g(a)] + f(b)[g(b)-c], V 

J a 

(18.9-5) 9(b) -t 

where / is any continuous function. c f ! 

To see this, let (x 0 , x u . . . , x n ) be any partition of __„4 j 

[a,b]. Then the differences gfe) - g(x,_i) will all be 1 ! — x 

zero as long as both points Xi- U x t are in the open O a b 

interval, since g(x) = c there. Hence in (18.9-2) only the Fig ^ 
terms for i = 1, i — n remain, and 


f f(x) dg(x) = lim tf(xi)[g(xi) - g(x 0 )] + f(Xn)[g(x n ) - g(x n - 1)]}- 

Ja |P|-*0 

But x 0 = a, x n = b, g(x , ) = g(x„:,) = c. Also, x',->a and x' n -+b when |P|->0. 
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Therefore, because of the continuity of /, /(*;)-> /(a), /(xi)^/(h). When these 
results are assembled, we see that (18.9-5) holds. 

Observe that the differences c - g(a), g(b) - c are the amounts by which the 
value g(x) jumps at the points of discontinuity as x moves to the right. 

If there are more points of discontinuity it is easy to extend the formula 
(18.9-5). 

Example 3. Suppose [a, b ] is divided into N parts by the partition 
(a 0 , a h . . . , a N ), and suppose g(x) has the value c, in the interior of the ith 
subinterval (see Fig. 172). The values of g(x) at the points a 0 , . . . , a N can be 



a 0 -a d\ a <2 — i a N = b 


Fig. 172. 

quite independent of the values c u . . . , c N . If / is continuous on [a, b] we can 
show that 

[ f(x)dg(x) = f(a)[c i- g(a)] + /(a,)(c 2 -ci)+ • • • 

J a 

+ f(a N -i)(c N - c N - 1 ) + f(b)[g(b) - c N ]. (18.9-6) 

Observe that the values of g at a u ... , a N - 1 (the interior points of discontinuity) 
do not enter into the formula. 

The proof of (18.9-6) can be given with the aid of the general formula 

f f(x) dg(x) = f f(x) dg(x) + [ f(x) dg(x) + •••+[ f(x) dg(x). 

(18.9-7) 

This last formula is valid for any functions /, g such that all the integrals exist. 

As a concrete illustration of the kind of thing arising in Example 3, consider 
the moment of inertia of a number of mass particles distributed along the x-axis. 
Let the masses m h . . . , m„ be located as shown in Fig. 173, with <x 2 < * * * < 



Fig. 173. 
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x n . The moment of inertia of this mass system about the y-axis is 

I = m x x 1 + • • • + m n x 2 n . (18.9-8) 

We shall show how (18.9-8) can be written as a Stieltjes integral. Take any 
closed interval [a, b ] containing all the points x x , . . . , x n . For any x in [a, b ] 
define g(x) = 0 if a^x^x h and, if Xi<x, define g(x) as the sum of all the 
masses m* for which a^x t ^x. The graph of g(x) appears in Fig. 174. 

The exact definition of g(x) is 

g(x) = 0 if a^x^xi 

g(x) = m, if xi <x <x 2 

g(x) = mi-f-m 2 ifx 2 ^x<x 3 


g(x) = m i + m 2 + • • • + m„ if x„ ^ x ^ b. 

In this case there is no jump at b , and there is a jump at a if and only if a = x x . 
Formula (18.9-6) now becomes 

[ f(x) dg(x) = f(xi)ni] + • * * + f(x„)m n . 

J a 

In particular, if /(x) = x 2 we see that 

f b 

X 2 dg(x)= m ] x 2 ,+ ■ ■ ■ + m n xl = I. 

J a 

This is the desired expression of (18.9-8) as a Stieltjes integral. 

It is important to know some of the general conditions under which it is 
certain that the limit (18.9-2) defining the Stieltjes integral will exist. As we have 
seen, there are interesting and important cases where the integral exists when g 
has certain discontinuities. The most important conditions sufficient to guarantee 
the existence of the integral are that / be continuous and that g be either (1) a 
nondecreasing function, (2) a nonincreasing function, or (3) the sum of a 
nondecreasing function and a nonincreasing function. A function g is called 
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nondecreasing if its values never decrease as x increases; it is called nonin- 
creasing if g(x) never increases as x increases. A function g of the third type in 
the foregoing classification is said to be of bounded variation on [a, b]. There is 
an alternative way of defining such functions, based on the notion of computing 
the sum of the total increase and the total decrease of the values of the function 
as x goes from a to b. We do not have enough space to discuss in detail the 
interesting and important properties of functions of bounded variation. 

It can be shown that if f b a f(x) dg(x) exists, then J b a g(x) df(x) also exists, and 

f f{x)dg(x) = f(b)g(b)-f(a)g(a)- [ g(x) df(x). (18.9-9) 

J a J a 

This is called the formula of integration by parts. 

Still another important fact about Stieltjes integrals is that if g has a 
continuous derivative and if / is integrable in the Riemann sense, then 

[ b f(x)dg(x)= [ b f(x)g'(x)dx, (18.9-10) 

J a J a 


where the integral on the right is a Riemann integral. 

Example 4. Consider a linear mass distribution on the interval a^kx^b. 
Suppose the density p at the point x is a continuous function of x, denoted by 
p(x). Then the mass on the interval [a, x] is 


m (x ) = 



dt. 


We know that m'(x) = p(x); therefore (18.9-10) applies with m(x) in place of 
g(x). In particular, the total mass is 

M = f 1 • dm(x ) = [ p(x) dx 

J a J a 

and the abscissa x of the center of mass is given by 

Mx = f x dtn(x) = f xp(x) dx. 

J a J a 


The function m(x) in Example 4 is called a cumulative mass distribution 
function. This same name applies to the function g(x) which is discussed in 
connection with Fig. 174, but in that case there is no continuous density function. 
One of the great uses of the Stieltjes integrals is in the unification of ideas and 
formulas about mass distributions, whether discrete or continuous. 

The notion of a distribution function occurs also in the theory of probability, 
with applications in statistics, and Stieltjes integrals play an important part in the 
general formulation of such concepts as first moment , variance , and mathema- 
tical expectation. 
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EXERCISES 

1. Suppose g(x) = 1 if 0^x<l and g(x) = 4 if l<x^2. Calculate the values of 
Jo 2 x k dg(x) for k = 0, 1, 2. 

2. Suppose g(x) = x if 0 < x < 1, g(x) = 2- xifl<x<2, g(0) = 1, g(l) = 0, g(2) = 2. 
Calculate /d xdg(x), J? x dg(x), / 0 2 xdg(x), and check by (18.9-7). 

3. Suppose g(x) = 5-n if n-l<x<n, for n-~ 1, 0, 1, 2, 3, and g(n) = n. 
Compute S- 2 x k dg(x) for k = 0, 1,3. 

4. If g(x) = e~ x , find (a) f 0 \xdg(x), (b) fo g(x) dg(x). 

5. If g(x) = tan" 1 x find (a) fvjx dg(x) and (b) f -X 3 g(x) dg(x). 

6. Let g(x) = [x] (the greatest integer n such that n ^ x). 

(a) Find f<? x dg(x) for the successive values a = 1, i, 2. 

(b) Find JJg(x)d(Vr+P). 

7. A uniform rod 10 feet long weighs 5 pounds and carries three concentrated 

weights of 1 pound each fastened at each end and at the point x = 5 (the rod extending 
from x = 0 to x = 10). Define w(x) = total weight on [0, x] if x >0, w(0) = 0. (a) Draw 

the graph of w(x). (b) Calculate / 0 10 xdw(x) and fj° (x - 5) 2 dw(x). 

8. Let </>(x) be continuous and positive on [a, b]. Let A(x) be the area between 
y = 4>(x) and y = 0 from a to x. 

(a) Express the total area under the curve, and the first and second moments of this area 
with respect to the y-axis, as Stieltjes integrals. 

(b) Express the y -co-ordinate of the centroid of the area by a formula involving a 
Stieltjes integral with dA(x). 

9. If the area in Exercise 8 is revolved around the x-axis, let V(x) be the volume 
generated by A(x). Express the co-ordinate x for the centroid of the total volume by a 
formula involving a Stieltjes integral. 

10. Use (18.9-5) and (18.9-7) to prove (18.9-6). 



19 / INFINITE SERIES 


19 / DEFINITIONS AND NOTATION 

It is likely that a person studying this book has already learned some things 
about infinite series at a more elementary level of study. However, to meet the 
needs of students with varying degrees of experience, the basic ideas of the 
subject are presented here without depending on what the student may already 
know about infinite series. The main reason for studying infinite series is that 
they are widely used to define or represent functions. One must know some of 
the tests, methods, and techniques explained in this chapter in order to follow 
the reasoning and understand the use of infinite series in the theory of functions. 
Also, one must have some facility in using infinite series in the study of 
differential equations and in applied mathematics. Chapter 19 is mainly concer- 
ned with questions of convergence and divergence of infinite series. Chapters 20 
and 21 are concerned with the study of functions defined by infinite series or 
infinite sequences. 

Perhaps the simplest of all infinite series is the geometric series 

1 + x + x 2 + x 3 + • • * + x n + • • * . (19-1) 

Let us briefly summarize the main facts about this series. We start from the 
algebraic identity 

1 -x n+l = (l-x)(l + x + x 2 + • • • + x n ), n 1 
and rewrite it in the form 

-r^— = 1 + X + - • • + x" + -^2> X? 1. (19-2) 

1 - X 1 - X 

If - 1 < x < 1 we see that lim n ^ oc x n+l = 0; therefore 

lim(l + x + • • • + x") = - — - 

n->°c 1 X 

We write this last result in the form 

— - — = 1 + x + x 2 + * * * + x n + • • *, — 1 < x < 1. (19-3) 

1 - x 

Here we have an infinite-series representation of the function (1 — x) ', valid 
subject to certain limitations on x. 

Let us now make some formal definitions. Suppose that n 0 , «i. 
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m 2 , . . . , u n , . . . is an infinite sequence of numbers. The expression 

Uq T u i + u 2 + * * * + u n + • • • ( 19 - 4 ) 

is called an infinite series, and the numbers u 0 , U\, . . . are called the terms of the 
series. The series is in a formal sense just a certain collection of mathematical 
symbols arranged in a certain way. Now we consider the sequence of numbers 
So, Si, Si , . • . , s„, . . . formed as follows: 

s 0 = Mo 
s I = Mo+ M] 

S 2 = M 0 + Ml + n 2 


= M 0 + Ml + • ' • + M n . 

If the sequence s n has a limit as n-*° © we say that the series (19-4) is 
convergent ; if the limit of s n is s we say that s is the sum (or value) of the series, 
and we write 

S = Ho+ Ml + W 2 + • * • + M„ + * * * . 

If the series is not convergent we say that it is divergent, and we do not assign it 
any sum. The numbers s 0 , Si, . . . , s n , • • . are called the partial sums of the series. 

We have made the definitions in the case where the terms are numerical 
constants. If the m’s are functions of a variable x, they assume definite numerical 
values when x is given a fixed value, and then all the foregoing definitions apply. 
In such a case the partial sums s n will also be functions of x. 

Let us return to the series (19-1). We have seen that it is convergent if 
-1 <x < 1, the sum then being given by (19-3). For example, 


3 

2 





But if x ~ — 1 or 1 ^x the series is divergent. If x ^ 1 this is clear at once, for 
then s n = n + 1, and so s n -> + <*> as n -» °o. if x = - 1 the series becomes 

1-1 + 1-1 + - *, 

with s 0 = 1, si = 0, s 2 = 1, s 3 = 0, . . . . This sequence {s„} is not convergent, and so 
the series is divergent. How do the partial sums behave when x < — 1? 

From a given series (19-4) we may form new series by omitting a certain 
finite number of terms at the beginning, e.g., 

U 2 + M 3 + U 4 + * * * , 

M100+ Mtoi + * * * • 

All such series will be convergent if the original series is convergent, and 
divergent if the original series is divergent (why?). Likewise we may form a new 
series by multiplying each term of the original series by the same constant c : 


CUq-X- CU 1 + • • • + CM„ + * * * . 
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If c^O this series will be convergent if and only if the original series is 
convergent (why?). If the original series had sum s , the new series has sum cs. 
We frequently use the summation symbol 2 in dealing with series: 

N 

Xm» = «i+ M 2 + ■ ■ ■+ U N , 

n — I 
n 

2 Mi = + M m +1 + * * * + M„. 

i = m 

Observe that the index n in (19-5) and the index i in (19-6) are dummy indices ; 
this means that the expression is not altered if the index letter is changed: 

N N 

2 m„ = 2 

n=l k=l 

A sum such as (19-6) is in certain respects analogous to an integral f(x) dx, in 
which s is a dummy variable. The u* of (19-6) corresponds to f(x); the limits of 
summation i = m and i = n correspond to the limits x = a and x = b on the 
integral. 

The infinite series (19^1) is now denoted by 

2 (19-7) 

n=0 

whether it is convergent or not. Often, for convenience in printing, we write 2u n 
in place of (19-7), suppressing the limits 0 and a>. 

Finally, we note that u n is not necessarily the nth term of the series. The 
index notation merely expresses the fact that u n is a function of n. It is very 
important to observe that if the series (19-7) is convergent , then u n ^>0 as n ->oo. 
For u n = s n ~ s n -\. If the series is convergent, s n and s n -|->s as n -><», and so 
lim u n = lim s n — lim s n -i = 0. But although n„^0 is a necessary condition for 
convergence, it is by no means sufficient. A series may diverge, even though 
M n — > 0 (see Example 1, §19.2). 


(19-5) 

(19-6) 


EXERCISES 


1. For what values of x is each of the following series convergent? Express the sum 
of the series as a simple function of x in each case. 

(a) a + ax + ax 2 + ■ • • + ax n + ■ • • , a ^ 0; 

(b) 8x + 8x 3 + 8x 5 + ♦ • • + 8x 2n+1 • • * ; 

(c) cx 2 + cx 4 + cx 6 + • ■ • + ex 2 " + • ■ • ; 


X X X 


(e) 1 + 




1 

a + *y 


X (1 + x) 

® 2+2 (irf) +2 (iri) 2+ --- 

(g) e x + e 2x + e 3x + • • • + e nx + • • • 

(h) log x + (log x) 2 + • ■ • + (log x) n 


+ • 
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2. Explain why the series 


1+1 1+2 
1 + 2 1+4 


+ 


1 + n 
1 + 2 n 


+ ■ ■ * 


is divergent. 

3. Prove that the series 

- - +-+ — i— — » +... 

V1000 + 4 V 1000 + 9 V 1000 + n 2 

is divergent. 

4. Discuss the behavior of the partial sums s n in the case of each of the following 
series: 

(a) 1 — 2+3 — 4 + 5 ; 

(b) 1 + 5 — 1+1— 1+g — 1 + • * ■ ; 

(c) (1 - 5) + (2 — 3) + (3 — i) + • • • ; 

(d) (V2- VT) + (V3-V2) + (V4-V3)+ • • • . 

Which, if any, of the series are convergent? One can tell that two of the series are 
divergent without investigating the partial sums s n . Which are these series, and how does 
one know by quick inspection that they are divergent? 

5. Is the series 


(101)! + (102)! + " ' + (100+ n)! + ' 
convergent or divergent? Justify your answer. 


19.1 / TAYLOR S SERIES 


In order to become familiar with a number of interesting examples of infinite 
series, let us examine how we are led to study the representation of functions by 
infinite series. The starting point is Taylor’s formula with remainder (Chapter 4, 
especially §4.3). For instance, in Examples 3 and 4 of §4.3 we saw that if x > - 1, 

!og(l + x) = X -IX 2 +5X 3 - * • * + (— + jR b+ i, 


where 


and 


|R„ +1 |sJ^ 2 if osx^l 
|R„ + 1 |sM^ if -Kxsso. 


These inequalities show that R n+1 -^0 as when x is limited as indicated. 

Therefore 

log(l + x) = lirnTx -^x 2 + \x 3 ~ * ‘ * + (~ 

n->cc L ft J 

if — 1 <x ^ 1. According to the definitions in §19 this result may be written in the 
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form 


log(l + x) = x-ix 2 + 5 X 3 + (— l) n_l “-x n + • • •. (19.1-1) 

As a special instance, let x = 1; then 

log 2 = l-|+ 3 1 -i+--- + (-l)"- , ^+---. (19.1-2) 

Another series representation of log 2 may be obtained by putting x = —\ in 
(19.1-1). We find 

log 2 = “log 2 = — 2 “ 2 (“ 2 ) 2 + 3 (“ ^) 3 — * * * , 


, ~ LI 

log 2 = ^+ j 


1.1 

^2 + ^ 




+ n'¥ + 


(19.1-3) 


Now suppose that / is a function which has derivatives of all orders on an 
open interval containing the point x = a. According to Taylor’s formula (4.3-8) 
we have, for any x of this interval, 

/(x) = /(a) + /'(a)(x - a) + • • • + - J, U> <x - a) n + R„ +l , 


where the remainder R n+] can be expressed in a variety of different forms. If we 
can show, for a particular x , that lim R n+1 = 0, then it follows that f(x) can be 
represented by an infinite series 

/(*) = 2 (*-«)" (i9.i-4) 

(with the usual convention that n ! = 1 if n = 0). This is called Taylor's series 
expansion of f(x) about the point x = a (also sometimes called the expansion of 
f(x ) in powers of x — a). Observe that we are not asserting that (19.1^1) is 
always true. It will be true if lim„-c» R n +i = 0, but this is a matter to be 
investigated for each particular function and each particular x . 

Example 1. The function e x can be represented by Taylor’s series for all 
values of x, no matter how the point x - a is chosen. 

In order to prove what has just been stated, let us first choose a = 0. Then, 
since f in) (x) = e x for all orders n, Taylor’s formula with remainder is 

e x = l + x + — +• • • + ^7 + ^n+i* 


Lagrange’s form of the remainder is 


Rn + I — 


(n + 1)! 


n + l 


with X between 0 and x. Thus certainly — |x| <X <|jc|, and 0<e x <e w ; there- 
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^ +1 l“* W (nW (l9J " 5) 

We shall prove that lim n -*ao£ n+ i = 0. From (19.1-5) we see that it will be sufficient 
\x\ n 

to prove that lim^ lj t = 0. Now choose an integer N such that N ^ 2\x\. Then, 
n ! 1 


to prove that lim 
if n > N, 


n ~+° c » 0. Now choose an integer N such that N ^ 2|jc|. Then, 


i^r _ ixr w n ~ N 

n ! N ! (N + 1)(N + 2) 

Jxl N / N \/ N 


~ N! \N + 1/\N +2/ \ n / 2" 


n ! ~ N ! V2 


Keeping N fixed and letting n o°, we see that (i) n N 
result is attained. Therefore the series representation 


0, and so the desired 


1+ * + fi 


•+V 

n! 


(19.1-6) 


is valid for all values of x. 

Now consider any point x = a. Taylor’s formula for e x now takes the form 

x a > a, \ , . a(x-a) n , X (X ~ a )" +1 

e x = e a + e a (x — «) + *•• + e al - — r^-+ ^ > 

n! (n + 1)! 


with X between a and x. A very slight modification of the foregoing arguments 
shows that the remainder approaches zero as n -> 0 °, so that the representation 


e° + e a (x - a) + • • • + e' , (x — ? — + 

n ! 


(19.1-7) 


is valid for all values of x. 


Example 2 . The functions sin x and cos x can be represented by Taylor’s 
series for all values of x, no matter how the point x = a is chosen. The series 
take particularly simple forms when a = 0, namely 


(19.1-8) 

(19.1-9) 


X 3 X 5 

x 7 

* 3! 5! 

7! 

x 2 x 4 
1 2! 4! 

x 6 

6! 


The details are left to the student (see Exercise 1). 

In general the problem of finding a Taylor’s series representation for a given 
function, and proving that the representation is valid, can be a very difficult task, 
especially if we approach the problem directly as in the foregoing examples. It 
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may prove to be a very complicated matter to calculate the derivatives / (n) (x) for 
higher values of n, and one cannot always expect to find a manageable general 


formula for the coefficient 


f in \a) 


Moreover, if one cannot get a reasonably 

n : 

simple formula for / (n+1) (x), there is very little chance of proving that R n+i ^ 0 as 
n -> oo by means of the standard formulas for the remainder (Lagrange, Cauchy, 
or the integral form (4.2-8)). There are, however, more advanced methods for 
attacking the problem of finding out whether a function can be represented by 
Taylor’s series. The most important of these methods belongs to the theory of 
functions of a complex variable, and is beyond the scope of this book. 

In certain particular cases which are interesting and important there are 
special devices which enable us to find series representations without the 
necessity of calculating a general formula for f (n \x). One such device is 
considered in §19.11. Other devices are considered in Chapter 21. 


EXERCISES 

1. Prove the validity of the series expansions (19.1-8) and (19.1-9) for sin x and cos x. 

2. Prove the validity of the expansion 


(l-x)- ,,2 = l + ix + ~x 2 + 


1 ■ 3 • - (2 n- , 

2-4 •••2 n 


when — 1 < x < 1. Suggestion: See Exercise 7, §4.3. For the case 0<x<l it is 
convenient to use the fact that na n ->0 as n ->oo if | fl | < 1. Prove this fact separately. 


19.11 / A SERIES FOR THE INVERSE TANGENT 

In the formula (19-2) let us put x = — f 2 . Then 


= 1- r + t‘ 


1 + (2 -. ... +(-i)V" + (-l) y^rp- 

Integrating both sides of this algebraic identity, we have, for any x, 

fX Af y 3 y 5 v 2n+l rx f 2n+2 

J 0 rrp =tan ' ,x=x_ T + T~' - ' +(_1 )"^TT + ( _1 )" + lfr? 

We apply Theorem V of §4.4 to the integral on the right: 

r t2n+2 h 1 r i, - 1 
Jo TTF dI i + x 4 Jo 1 dt 1 + . 

where X is between 0 and x. Now if |x| ^ 1, 


n + l 


2n+2 


(19.11-1) 


dt. 


X 2n + 3 


2n+3 

lim . - — 

n-+cc2,Tl t 3 


= 0, 


and therefore we conclude that 


.2n+l 


tan 1 x = x — j + j + (-1) "^r TT + 


(19.11-2) 
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This series representation is valid if -1 ^x ^ 1. In particular, if x = 1, we find 
the interesting result 

f=l-3+5-7 + 5 • (19.11-3) 

The series (19.11-2) is actually the Taylor’s series expansion of tan" 1 x in powers 
of x, even though we did not find the terms of the series by calculating the 
successive derivatives of tan" 1 x . This latter procedure would lead to great 
complications, as the student may find if he attempts it. But it can be shown in 
the general theory of power series, that if a function can be represented by a 
series of powers of x - a, then the series is actually the Taylor’s series expansion 
of the function about the point x = a. 


EXERCISES 

1. Show that it = 16 tan -1 5 — 4 tan -1 259 * Start by setting <f> = tan' ' 5 and computing 
tan 2 4> and tan 4 <£ by repeated use of the formula for the tangent of twice an angle. Then 

compute tan ( 4 $ If the inverse tangents in this formula for it are computed by the 

series (19.11-2), a value of it accurate to a number of decimal places may be found 
without excessive labor. As a sample, let the student obtain the approximation 7 t = 
3.141593, of which the first five decimal places are accurate and the sixth is correct as a 
rounded-off figure. 

2 . Show that 2 tan -1 jo- tan -1 5 = tan -1 575* Combine this with the result of Exercise 1 
to obtain the formula 


7T 

J 


8 tan -4tan 


515 


- tan 


1 

239* 


Try this formula along with (19.11-2) in the computation of it. 


19.2 / SERIES OF NONNEGATIVE TERMS 

For a deeper study of the general problem of representation of functions by 
infinite series it is essential to have a certain fund of general knowledge about 
infinite series as such. We turn then to a different problem. Suppose we have 
before us a series, no matter how obtained. What can we do to determine 
whether or not the series is convergent? We begin by considering the special 
case in which all the terms of the series are nonnegative. Special though this 
case is, what we learn about it will have important applications to the general 
problem. 

THEOREM I. Suppose that u n ^0 for every n. Then the series is convergent 
if and only if the sequence {s„} of partial sums is hounded . 

Proof. By definition = u 0 + Ui+ • * • + u„. Since u n ^ 0, it is clear that 0^ 
s n ^ s n+ i. If the partial sums are bounded, the sequence {s n } has a limit, by 
Theorem III, §2.7. On the other hand, if the sequence is not bounded, then 
s n -» + 00 as n-^>°°. This completes the proof of the theorem. 
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Example L The series 

l + i + J+---+£+--- (19.2-1) 

is divergent. It is called the harmonic series. 

We prove the divergence by showing that the partial sums are not bounded. 
Let 


Then 


S n ~ 1 + 2 + * * * + 


$2n $n F 


n + 1 n + 2 


' + 2n >Sn + 2 


since 

n + l + n + 2 + + 2 n >n 2 n 2 

With s 2n > s n + 2 for every n, it is plainly impossible for {s n } to be bounded. We 
have 

S] = 1, s 2 = i, 5 4 > s 2 + 2 = 2, s 8 > s 4 + \ > i’ 


and in general 


S 2 " > 


n + 2 
2 


For a series with negative as well as positive terms, convergence of the 
series is not guaranteed by boundedness of the partial sums. For instance, the 
series 


1 - 1 + 1 - 1 + 1 

is not convergent, yet for its partial sums we have s n = 1 or s n = 0, depending on 
the oddness or evenness of n. Thus these partial sums are bounded. 

THEOREM II. Let 2 a n and 2h n be two series of nonnegative terms , and suppose 
that , for all values of n after some fixed index N, it is true that a n = b„. Then 
if the series is convergent , so is la n , and if the series la„ is divergent , so 
is 2b n . 

Proof. In discussing convergence or divergence we may drop the terms with 
index less than N. Then, for any n > N, 

On + Un+i + * • * + a n ^ b N 4- + ■ ■ * + b n . 

The proof of the theorem is an immediate consequence of this inequality and 
Theorem I. 
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Example 2, The series 


1 + ? + ? + * + n " + ’ (19.2-2) 

is convergent. For if n > 1, and the series 

n nl 

1 + i + b-- + ^ + "- (19 - 2 - 3) 

is known to be convergent (see the series (19.1-6) for e x with x = 1; see also 
Example 5, §1.62). 

THEOREM III. Let I a n and 2b n be two series of positive terms , and suppose 
that ajb n approaches a nonzero limit as n-> oo. Then either both series are 
convergent , or both are divergent. 

Proof. Suppose that ajb„-+c, and ct^O. Then, for all sufficiently large 
values of n we shall have \c < a n b n < \ c, whence 

b n < (^j a n and a n < (^c ^ h n . 

Since the convergence or divergence of a series is not affected by multiplying 
each of its terms by the same nonzero constant, the foregoing inequalities 
together with Theorem II are sufficient to prove Theorem III. 

Example 3, The series 


2 * * 
3; 4! 


+ • 


• • + 


(n + l ) 2 , 

(n + 2)! 


is convergent. We prove this by using the convergent series (19.2-3) and 
applying Theorem III: 


(n + l) 2 / 1 _ (n + l) 2 _ n + 1 . 

(n + 2)!/ n’.~(n + 1 )(n + 2) ~ n + 2~* 


Example 4 . The series 


i+|+|+ 


+ 


2n - 1 


is divergent. We prove this by using the series (19.2-1) and applying Theorem 
III. 


2 n 


1 /I = n 1 
- 1/ n 2n - 1 2 
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EXERCISES 

1. Test the following series for convergence or divergence by Theorem II. 

(a) E ~ 

n=lVn 

<b »l,i ST 

(C) 1+ T3 + 1 • -3 • 5 + ' + 1 -3 • • '(2n- 1) + " ' ' 

(d) 2^2 + n 5+ ' ■ ' + (n + 1)2” + ‘ ' ■ ■ 

2. Prove that (n + l)/n!<8/2" if nil. What do you conclude about the following 


2 + l + ... + «±I 

II 2! n\ 


3. Compare the series 


il2 + 5 1 3 
3! 4! 


(n + 3)(n + l) 


3! 4! (n+2)! 

with a suitable multiple of the series (19.2-3), and use Theorem II to establish con- 
vergence. Also prove the convergence of the series by using the series (19.2-3) and 
Theorem III. 

4. Compare the series 

3,3-5 3 • 5 - 7 - - - (2w + 1) 

5 5 -10 5 ■ 10 • 15 ■ • • 5n 


with a suitable multiple of the series 


1 + |+-- -+2^ 


and use Theorem II to establish convergence. 

5. Test the following series for convergence or divergence by Theorem III. 


(a) £ 

n = 1 


n 4- 1 
(n+2)n\ 


(b) 2 


2n + 1 
n 2 + n 



oo 

(d) 2 

n — 1 


Vn + 1 
If™” 


/r x *(n + m+ 2 ) 

(e) h n 2 * 2 n 

(f) 2 n (l ~ nyn (see Exercise 17, §1.62). 

n = 1 

(g) £ ( ” + J r (see (1.62-5)). 

n = I n 


6. Prove that, if 2a n and 2b n are series of positive terms with 2b n convergent and 
ajbn ->0, then 2a n is convergent. State and prove a comparison theorem in which part of 
the hypothesis is that ajb n 

7. Let Sa„ be a convergent series of positive terms, and let 2b k be a series in which 
b k = a nk , where Hi, n 2 , . . . , n k , . . . is a sequence of positive integers such that rtj if 
i^j. Prove that the series 2b k is convergent. 



19.21 


THE INTEGRAL TEST 


577 


8. Show that the series 

T 2 + 2^3 + " + n(n + l)'" 

is convergent by noting that 


1 ^1 1 
n(n + 1) n n + 1 

and finding a simple expression for the sum of the first n terms. What can you infer about 
the series 


i + i + i+i 

P+22+32+42 


and 


-L+-L+-L + ...9 

1-2 3-4 5-6 


9. Prove that the series 
1 


. 2 2 1 - 3 2 1 . (n + l) 2 , 

1 ° 8 1 -3 + 2 og 2-4 + " + n 0g n(n+2) + ' " 

is convergent by showing directly that the sum of the first n terms is less than 
1 2(n + l)^ „ 

8 ^rT+T < 0g 2 ' 


19.21 / THE INTEGRAL TEST 

For this section the student will need to be familiar with a few of the most 
elementary things about improper integrals of the type 

f /(x) dx = lim f f(x ) dx. 

J a a 

The integral is called convergent if the limit exists, and divergent if the limit does 
not exist. We refer the student to §18.8 and §22. 

The idea of the integral test is to relate the convergence or divergence of a 
series 


U 1 T u 2 + u 3 + ■•• + «„ + ••• 

to the convergence or divergence of a certain improper integral faf(x)dx. A 
relationship of this sort can be established for certain kinds of series. For the 
series 


1 , 1 , 1 

72 + ^+^ 


+ • * • + • • 


the appropriate integral to consider 
such that f(n) = u n . 


ider is f In general one wants a function f(x) 
Ji x 
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THEOREM IV. Let f(x) be a function which is positive , continuous, and 
nonincreasing as x increases for all values of x^N, where N is some fixed 
positive integer. Let the terms of an infinite series be given by u n = f(n) when 
n^N. Then the series lu n converges or diverges according as the improper 
integral f(x) dx is convergent or divergent. 


Proof. If m ^ x ^ m + 1, we have f(m + 1) ^ f(x) ^ f(m), and therefore 


r m + 1 

Hm+i = /(m + 1)^ I f(x) dx Sf(m) = u m . 
J m 


We set m successively equal to N, N + 1, . . . , n and add. The result is 


u N + 1 + • • • + u n + 1 ^ 


r 


n+ 1 




(19.21-1) 


Suppose now that the integral J7? /(*) dx is convergent. Then from the first 
inequality in (19.21-1) it follows that 

Un +1 + * • * + u n +\ = f(x) dx. 

Jn 


This shows that the partial sums of the series are bounded, and hence 

that this series is convergent (by Theorem I, §19.2). The series is then 

convergent also. On the other hand, suppose the integral /£/(*) dx is divergent 
Since f(x) > 0 this can only happen if /£ +1 f(x) dx -> +oo as n -> <». It then follows 
from the second inequality in (19.21-1) that u N + • • • + u n -> + and hence that 
the series is divergent. This completes the proof. 

Example . The series 

J>+jp + jr + --+J[p+-- 09-21-2) 

is convergent if p > 1 and divergent if p ^ 1. 

These results are established by considering the integral f and applying 
Theorem IV. If p > 1 we have 

( b dx x~ p+1 b _ 1 /1 \ 

J, x p -p + 1 , l-pU p ~‘ )’ 


lim 

b-+oc 


/: 


dx 

x p 


‘“dx 

vP 

i x 


_1 

P ~ 1 


We leave it for the student to verify that the integral diverges if p ^ 1. If p = 1 
this gives us a new proof that the harmonic series is divergent (see Example 1, 
§19.2). 

The proof of Theorem IV gives us some estimates of the sum of the series if 
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it is convergent. From (19.21-1) we have, on Jetting n 

oc r oc oc 

2 u n S f(x)dxS^u n . (19.21-3) 

n = N+l JN n = N 

Actually neither inequality can be an equality. That is, (19.21-3) remains true if 
the sign ^ is replaced by < at both places where it occurs. We leave it for the 
student to supply the argument in justification of this assertion. 


EXERCISES 

1. Test the following series for convergence or divergence by Theorem IV. For the 
convergent series give an upper bound for the sum of the series with the aid of (19.21-3). 


(a) 2 77= 


n-l Vn +1-1 


(e) 2 


n(log n) : 


(b) 2d 


2 n 


(c) 2 


ift>n -4 

1 


(« 2 


n = 4 n log n[log(log n)) 


2 * 


(d) 2 


~i(2n-l)(2n) 
1 


(g) 2 


10 * 


^2 n log n 
2. Show that the series 


^2 (log n) 

(h> 2 

n—2 U 


1 + 1 


2(log 2) 3(log 3) [ 


n (log n y 


is convergent if p > 1, divergent if p ^ 1. 

3. For what values of p does the series 2 

1 


4. Show that the series 2 


(log nf 


«=4 n log n[log(log n)] p 
diverges for all values of p. 


converge? 


5. For what values of p and q is the series J] — convergent? 

n = 2 fl 

6. Let Cn = 1 + 5+ ■ ■ ■ + (l/n)-log n. Put N = 1, f(x ) = 1 lx in (19.21-1), with n - 1 in 
place of n , and show that 0 < 1/n ^ C n . Also show that C n - C n+l >0. The sequence C n is 
therefore convergent, since it is decreasing and bounded below. The number C = 
lim n -*cc Cn is known as Euler’s constant. Its value is approximately 0.577. 

Show by a similar argument that 




+ — 5=-2(Vn - l)]exists. 
Vn -I 


19.22 / RATIO TESTS 

In this section we deal with series all of whose terms are positive. In many cases 
it proves to be useful to consider the ratio u n+ \lu n of two successive terms of the 
series 2 u n . 
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THEOREM V. Let Zu n and 2 v n be two series of positive terms , and suppose 
that the inequality 


Vn + l ^ u n + 1 
V n = U n 


(19.22-1) 


holds for all values of n. Then if 2u n is convergent, so is lv n . 

The same conclusion can of course be drawn if (19.22-1) holds merely 
for all sufficiently large values of n. 


Proof. We have 


2, Vl^ U 3, . . . , Vn ^ Un 

V\~Ui V 2 ~ U 2 9 V n ~l~ U n ^ 

Hence 

W2.«3.. 2 Si “n _ 1>1.. 

V n V\ * = V\ ... u n , 

Vl t>2 t> n -i U 2 M„_i M ! 

so that v n ^ Cm„, where C is the positive constant vju h The convergence of 
now follows by an application of Theorem II (§19.2). 

One of the most useful applications of Theorem V is obtained by choosing 
for 2 u n the geometric series 1 + r + r 2 +••• + **" + •••, which for positive r is 
convergent if r < 1. For this choice of u n we have u n+ ilu n = r. The following 
theorem is an immediate corollary of Theorem V. 

THEOREM VI. The series of positive terms is convergent if there is a 
positive number r < 1 such that v n+ Jv n ^ rfor all sufficiently large values of n. 

The ratio v n+ dv n need not approach a limit as but it does so in many 

cases arising in practice. 

THEOREM VII. Let n be a series of positive terms , and suppose that the ratio 
v n+ ilvn approaches a limit t as Then the series is convergent if 0 ^ t < 1 

and divergent if t > 1. If v n +ilv n -» + 00 we write t = + in this case the series 
diverges . 

Proof. If 0 ^ t < 1, choose r so that t < r < 1. Then v n+ dv n is close to t when 
n is large, and so v n+ Jv n < r for all sufficiently large values of n. The con- 
vergence of the series is then assured by Theorem VI. On the other hand, if 
t > 1, then u„+i > v n if n is large enough, and in this case the terms of the series 
cannot approach zero as n->&>. Hence the series cannot be convergent (see the 
final paragraph of §19). 

No conclusion can be drawn if t = 1, for this case can occur both with 
convergent and with divergent series. For instance, in the case of the series 
(19.21-2) it turns out that t = 1 for all values of p. 
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00 n 

Example. The series 2 is convergent. 


and 


n -> oc V n L 

Further uses of the theorems of this section will be developed in §19.4. 


EXERCISES 

1. Test the following series for convergence or divergence by Theorem VII. 


{a) v 

W „4,2-5--(3n-l) 


(b) 2 


(n !) 2 2" 
(2 n + 2)! 


(e) 2 ~ 


n\ 


(f) 2 


2 • 4 


2 n 


£ti4 • 7 • • ■ (3n + 1) 


(C) .?.2"‘ (8) „? 2 3"-'(n- 1)!' 

oc 

(d) 2 "(ir- 

rt = 1 

2. If 0 < r < 1 and p is any positive integer, show that the series 


2 n(n — 1) • • • (n-p)r" 

n = p + l 

is convergent. 

3. Suppose 2u n and 2i>„ are series of positive terms satisfying (19.22-1). Prove that, 
if is divergent, so is Su„. Is this theorem equivalent to Theorem V? 

4. Suppose u n > 0 and ^ 1 - — + if n> 2. Show that 2 u n is convergent. 

u n ti n 

5. Suppose u n > 0 and 1 - — ■ Show that is divergent. 

u n n 


19.3 / ABSOLUTE AND CONDITIONAL CONVERGENCE 


Theorems I-VII all deal with series whose terms are nonnegative. Thus far we 
have developed no general methods for discussing the convergence of a given 
series if its terms are unrestricted as to sign. On the other hand, in §19.1 and 
§19.11 we have incidentally met some examples of convergent series with both 
positive and negative terms, e.g., 


log 2 — + + 


! =1 _ 1 + ±_± + 

e 2! 3! 


. + (_l)"-i_L+ . . . 
v n 

+ (-l)"^+- • ' . 
n ! 


(19.3-1) 


(19.3-2) 
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Now there is an important difference between the two series (19.3-1) and 
(19.3-2). Each of the series is convergent as it stands. But let us consider the 
series which are obtained if we make the alteration of changing the signs of all 
the negative terms. We then have 

J + 1 + I+...+I+... (19 .3-3) 

l+l + ^+--+^+--. (19.3-4) 

The change from (19.3-1) to (19.3-3) has given us a divergent series (the 
harmonic series (19.2-1)), whereas in the change from (19.3-2) to (19.3-4) we still 
have a convergent series (its value is e\ see (19.1-6)). 

If a series of positive terms is to be convergent, the terms with large index n 
must be so small that even the sum of arbitrarily many of them is small. But with 
a series of terms in which infinitely many are positive and infinitely many are 
negative, the series may be convergent because of a partial cancelling out effect, 
e.g. a negative term offsetting the cumulative effect of one or more positive 
terms. This kind of process may operate to produce a convergent series even 
though the series would not converge if all the terms were replaced by their 
absolute values. This situation is illustrated by the convergent series (19.3-1) and 
its divergent counterpart (19.3-3). With this background of explanation we now 
make a definition. 


Definition. Let £ u n be a given series. Consider the series obtained by 

putting |u„] in place of u n . We say that 2 u n is absolutely convergent provided that 
the series 2|m„| is convergent . 

The careful student will observe at once that the definition of absolute 
convergence of 2u n does not in itself make any statement about the mere 
convergence of It is in fact true, however, that if 2|u„| is convergent, then 
so is Before proving this we must take up a general criterion for con- 

vergence. 

THEOREM VIII. A series 2u n is convergent if and only if to each e >0 there 
corresponds some integer N ( depending on e and the particular series ) such 
that 

|«m+]+ «m+2 + ’ ' - + M„| <£ (19.3-5) 


for all integers m, n such that N <n. 

Proof. This is a direct consequence of Cauchy’s convergence condition 
(Theorem VI, §16.5). We write 


S„ = Mi+ U 2 + * * ' + U n . 
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Then 


•Sn Sm H m +iT • • • + U n . 

The condition (19.3-5) is now seen to be equivalent to |s„-s m |<e and 
the theorem is merely the statement of Cauchy’s condition for the sequence {s n }. 

Theorem VIII is not very useful for making direct tests to find whether a 
series is convergent. But it is of fundamental importance in the general theory of 
infinite series, and we shall use it in proving various things about series. The first 
such application deals with absolute convergence. 

THEOREM IX. If the series 2n„ is absolutely convergent , it is convergent 

Proof. We are assuming that 2|u„| is convergent. By Theorem VIII, there- 
fore, we know that to any e > 0 there corresponds some N such that 

|w m+ l|+|u m+2 |+ • • • + |u„|<€ 

whenever N ^ m < n. Now 

l^m + l T ti m + 2~f“ * T ti n | = |^m + l| T ' * * T |l4 n |* 

Hence |u m+ i + • * • + u n \ < e whenever N^km<n. This is precisely condition 
(19.3-5), and so we conclude that the series 2 u n is convergent. 

Definition . A series is called conditionally convergent if it is convergent , but not 
absolutely convergent. 

The series (19.3-1) is conditionally convergent. 

We shall consider some of the differences between absolutely convergent 
and conditionally convergent series. In the series 2 u n let us denote by a t , a 2 , 
a 3 , . . . the positive terms, taken in the order of their occurrence; let the negative 
terms be denoted by - by, -b 2 , -b^ . . . . Thus, in the series 

i J. _i_ J. 

1 2 ' 3 4 + 

we have 

a x = 1 , a 2 = v <*3 = 5’ • . • , 

b 1 = 2’ b 2 = V by ~ 6’ 

We now consider the two series 2 a n and 2 b„, each of which consists entirely of 
positive terms. 

THEOREM X. If the series 2 u n is absolutely convergent then each of the series 
2 a n , 2 b n is convergent , and 2w n = 2a„-2h„. But if the series 2 u n is 
conditionally convergent , then each of the series 2 a„, 2 b n is divergent. 

The theorem is illustrated for the absolutely convergent case by the series 
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(19.3-2), which is the difference of the two convergent series 


1+ 5l + ?! + 




+ 



In the case of the series (19.3-1) each of the constituent series 

!+$+}+•••, i + J + i+--* 


is divergent. 


Proof of the theorem. Suppose that 2 u n is absolutely convergent, with 
M = 2|u„|. Then 

|U]| + \u 2 \ + • • • + | u n | ^ M, 

no matter how large n is. Now consider a partial sum of the series 2a„, say 
a, + • • • + a m . Since each a, is a positive term somewhere in the series 2 u„, the 
terms a h . . . , a m all occur in the sum |ui| + • • • + |u„| if n is sufficiently large. But 
then we see that 

fl i + a 2 + • • • + a m = M. 

It follows by Theorem I (§19.2) that the series 2 a n is convergent. In the same 
way we see that b\ + • • ■ + b m ^ M, since each b x is a term somewhere in the 
series 2|u„|. Thus the series 2b„ is convergent. 

Now suppose that, in the sum Ui + • • • + u„, the number of positive terms is 
p n and the number of negative terms is q„. Then 

Mi+ • • * + u n = (ai+ • • • + a Pn ) — (bj + • ■ • + b qn ). (19.3-6) 

In the case of absolute convergence we let n -» oo and obtain the result 

2 u n = ±a n -±b n . 

n = I n = 1 n= 1 

It may happen that there are only a finite number of a’s or a finite number of 
b 9 s, or possibly none of one kind or the other. In these cases the series is of 
course absolutely convergent if it is convergent at all, since its terms from some 
point onward are all of one sign. Let us then consider the case in which there are 
infinitely many terms of each sign, so that p n and q n -> oo as n Let 

s n = u l + • • ■ + u n , A Pn = a x + • • • + a Pn , B Qn = b l + * * • + b Qn , 
so that (19.3-6) becomes s n = A Pn - B qn . We also have 

|mj| + • • • + | u„ | = (fli + • • • + a Pn ) + (hi + ■ • * + b Rn ) 

= A p „ + B„ (19.3-7) 

Now suppose that the series 2 u n is convergent, and consider the series 2 a n , 
2 b n . If either of these latter series is convergent, so is the other, by virtue of the 
relation s n = A Pn — B Qn . For instance, if 2a„ is convergent, B qn approaches a limit, 
since s n and A Pn each approach limits, and B Rn — A Pn - s n . But to say that 
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limn^ocB^ exists is equivalent to saying that 2b n is convergent, since {B q J is a 
nondecreasing sequence, and is therefore convergent if and only if the partial 
sums of the series 2b n are bounded. But if both the series Sa„, 2b n are 
convergent, the series 2 |u„| is convergent, by (19.3-7). Thus we see that, if 2 u n 
is conditionally convergent, both of the series 2 a n , 2 b n must be divergent. This 
completes the proof of Theorem X. 


EXERCISES 

1. If 2c„ is a convergent series of positive terms, the series 2 c n x n is absolutely 
convergent when \x\ ^ 1. Prove this. 

2. Show that, if 0^r<l, the series 2 r" sin nd and 2 r n cos nd are absolutely 
convergent for all values of 6. 

3. Which of the following series are absolutely convergent? 

(a) 1-jj + j! — ^+ -. 

(b) 1 + + j! 


(c) 


(d) 2 


log 2 2 log 3 

sin nd 


1 

3 log 4 


+ (-!)" 


1 

( n - 1) log n 


n 


4. Let {a„} and {b n } be sequences of real numbers with the following properties: 
(i) b n > 1 


(ii) to each €>0 corresponds some positive integer N such that 
C„, p log(b„b„+i . . . b n +p) < € if p is any positive integer, where C n , p is the maximum of |a k | 
for n < k < n + p, and N < n. 

Then prove that 

(a) 2 a n log b n is absolutely convergent, and that 

(b) if there are infinitely many nonzero a„’s, the series 2 log b n is also convergent. 


19.31 / REARRANGEMENT OF TERMS 

Suppose thaLwe have two convergent series with values s and t respectively: 

S = Ui+U2+‘‘*+W n + ***, 
t = Dl + V>2 + ,, * + U n + 

We may combine these series by adding corresponding terms, and the result will 
be a new convergent series whose value is s + t: 

S + t = W\ + w 2 + • * * + + ■ • • , 

where wi=Ui+Ui, w 2 = w 2 + t> 2 > ♦ • • > = u„ + v n , . . . . This assertion is easily 

proved as a direct application of the rule that the limit of a sum is the sum of the 
limits, for 

(wi + • • • + w„) == (Mi + • * ■ + u„) + (t?i + ■ ■ • + 
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Now let us consider the question: Suppose the series 2w n is convergent; 
what can we say about a series which is obtained by using the same terms u n , but 
in a different order ? It may surprise the student to learn that (a) if the new series 
is convergent, it does not necessarily have the same sum as the original series, 
and ( b ) the new series may be divergent. 

Example 1. By rearranging the terms in the series (19.3-1), whose sum is 
log 2, we can obtain the following result: 

2 log 2 = 1+5-5 + 5+7-4+9 + A-6+ • • • • (19.31-1) 

The scheme in the rearrangement is to take two positive terms and one negative 
term, then two more positive terms and another negative term, and so on. To 
prove (19.31-1) we note that 

log 2 = l- 2 + 3 _ 4 + 3 _ 6 + 7 “*‘*, (19.31-2) 

\ log 2 = 2 “ 4 + 6“ 8 + K) — 12 + A — ‘ * ' ■ 

In this last series we may insert zero terms without affecting the value. Thus 

\ log 2 = 0 + 5 + 0- !+ 0 + £+ 0 . (19.31-3) 

On adding (19.31-2) and (19.31-3) term by. term we get 

i log 2 = I + O+ 3 — 2+3+O+7— 4 + * * •• 

This is the same as (19.31-1) when we drop out the zero terms. 


Example 2. In this example we shall not be so explicit. The series (19.31-2) is 
conditionally convergent, so that the series composed of its positive terms is 
divergent. This means that by taking k large enough we can make 


1 + 


1 + 1 + 
3 5 


1 

2k- 1 


as large as we 'please. We now rearrange the series (19.31-2) according to the 
following plan: First take just enough of the positive terms to obtain a sum 
greater than 2. Then take the first negative term, — J- This leaves us with more 
than 5. Now take just enough more 'positive terms to increase the total sum 
beyond 4, and then take one more negative term, Continuing in this way, we 
build up partial sums such that when the term -l/(2n) is taken, the total exceeds 
2 n - (1/n)- The series so formed must diverge. 

What we have just seen in Examples 1 and 2 is typical of conditionally 
convergent series. In fact, it may be proved that the terms in a conditionally 
convergent series may be rearranged so as to produce a series which has any 
desired sum, or such that the partial sums tend to +°o or to - 00 . By contrast, no 
such thing can happen with absolutely convergent series. 


THEOREM XI. Let the series 2 u n be absolutely convergent , with sum s. Let 2 v n 
be any series obtained by a rearrangement of the terms of 2 u n ( i.e ., every u, is 
some Uj and every u k is some «,-). Then 2 v n is convergent , with sum s. 
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Proof. First let us prove the assertion on the assumption that all the u’s (and 
hence all the v's) are nonnegative. Since s=2u n and since each v { is some u h it 
is clear that the partial sums of the series 2 v n cannot exceed s. Thus the series 
2u n must be convergent and its sum s' must satisfy the inequality s' ^ s. 
Reversing the role of S«„ and 2i we see that s^s'. Therefore s' = s. This 
completes the proof for the case of series of nonnegative terms. Now, in the 
general case of an absolutely convergent series, we have 

2 U n = 2 a n - 2 b « 

in the notation of Theorem X (§19.3). In the rearranged series 2t>„, the separa- 
tion into positive and negative terms yields 

2 =2 <ln-2 b 'n’ 

where Sa„ is a rearrangement of 2a n and 2b' n is a rearrangement of 2b n . By 
what has just been proved for series of positive terms, we have 

2 a 'n = 2 a "> 2 b 'n = 2 b n- 

Hence the series 2i?„ is convergent, with the same sum as 2u n . 

EXERCISES 

1. Show that log 2 = I+ 3 + 5 — \ — + H • *. 

2. Show that log 2 - l= 3 - 5 + 5 - 3 + 7 -s+* • 

Here the positive terms have odd denominators, and the negative terms have even 
integers as denominators. The terms alternate in sign. 

3. Show that i log 2= l-J-i + i- s — s+ s ^ * * 

4. Let be a convergent series, and let 2u„ be a rearrangement of it. In the 
rearrangement, suppose that no term of the original series is moved more than N places 
from its original position, where N is a fixed number. Show that the new series is 
convergent and has the same value as the old one. 


19.32 / ALTERNATING SERIES 

The simplest type of series having both positive and negative terms is the type in 
which the successive terms alternate in sign. Many commonly occurring series 
are of this type, and have the additional property that the magnitude of the terms 
steadily decreases toward zero as n increases. Concerning such series we have a 
theorem of practical importance as a test for convergence. 

THEOREM XII. Suppose the terms of the series 2 u n are alternately positive and 
negative , that |w n+1 |S|n„| for all n, and that 0 as n ->». Then the series 
is convergent. 
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Proof. For convenience let us write the series in the form 

C\— C 2 + C 3 “ C 4 + ’ * * + C2n-l— C2n + ’ ' * , 

where c n >0. There are two kinds of partial sums, depending on whether the 
sum ends with a positive or a negative term: 

S2n~] = Cj - C 2 + * • * + C 2 n-1 
$2n = Ci — C 2 + • * * C 2 „. 

If we consider the sequences {s 2n -i} and {s 2n } separately, we observe that 

5j ^ 5 3 ^ 5 5 ^ * * * 

5 2 ^ 5 4 = $6 = * ’ * • 

These inequalities depend on the fact that c„+i ^ c„. For example, 

5 7 = S 5 - C 6 + C 7 = S 5 ~(C 6 - C 7 ) ^ S 5 

since c 6 - c 7 ^ 0, and 

5g = 5 6 + C 7 ~~ C 8 = 5 6 

since c 7 -c 8 ^0. Moreover, Ci-c 2 ^5 „^Ci for all values of n (see Fig. 175). 
Therefore each of the sequences {5 2 „-i}, {5 2n } is convergent, being monotonic and 

1 J 1 1 1 1 

Ci Cj = Sj S 4 Sg S5 S3 “Ci 


Fig. 175. 

bounded. But 5 2n _i - 5 2n = c 2n , and since c 2n -> 0, it follows that both sequences 
have the same limit. But then the sequence {5 n }, where n runs through all 
positive integers, is convergent, its limit being the common limit of 5 1, 5 3 , 
5 5 , . . . and s 2 , 5 4 , 5 6 , . . . . The theorem is therefore proved. 


It is worth while noting that in a series of the kind just described , the partial 
sum uj + • • • + u n differs from the sum of the series by not more than |m„+i|, i.e., by 
not more than the first term not included in the partial sum. 


Example 1. The series 


1 — L + _L + (-i)”-' 

Vi V2 V3 



is convergent, since the conditions of Theorem XII are satisfied. 

Example 2 . Compute e 1 with an error not exceeding 0.005 by using (19.3-2): 


-1 


= 1-1 + 


1 


14 - 


2! 3! 4! 


It will be sufficient to stop with the term involving — if n is chosen so that 
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0.005. Thus we use the approximation 


t ^ 1 1^1 1 
6 1 + 2! 3! + 4! 5!’ 


the next term in the series being 


= 7^0 = ^ (approximately). 


This gives 


44 _ 11_ A ™ 
e 120 30 0 3666 


with an error less than 0.0014. Our work actually shows that 

0.3666 < e -1 < 0.3681. 


EXERCISES 

1. Show by Theorem XII that each of the following series is convergent. Prove 
carefully that each condition of the theorem is fulfilled. 

n 


1 2 3 

<•) ?"? + y + 


+ ( -» (^1? + 


(b) log!- log 5 + log! - 


(c) 

(d) 


1+1 

1 


V2 V3 


1 + 2 1 + 3 

1*3 


2 * 4 2*4*6 


+ ■•• + (-1 )" 


1 ■ 3 • •• {In - 1) 


3*6 


3 n 


2*4* 

1 


2 n ■ (2 n + 2 ) 


(e) 1} " ' 1 • 4 • ■ • (3n — 2) n 2 

2. Test the following series for convergence or divergence. 
log 2 log 3 log 4 

(S) V2 V3 V4 " ' ‘ 

■ 

(c) (1 — log 2)- (1 - 2 log l) + (l - 3 logs) . 

(d) 5 — l+s — ! + ! — !+•■•. 

3. Show that the conditions of Theorem XII are fulfilled by the series 

2 (- l) n — - for any fixed positive q and any p, provided the initial value n = N is 

n = N n 

large enough (how large will depend on p and q). 

4. The series 


1— L+-L-- L+-L- 

V2 V3 V4 V5 


is convergent, by Theorem XII. Why is it conditionally convergent? Show that the 
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rearrangement 

1+-U-U-U-L-- - 

V3 V2 V5 V7 V4 

diverges to +<*>. Suggestion. Compare the sum S 3n of the first 3n terms of the rearranged 
series with the sum s 2n of the first 2n terms of the original series, and show that 


S 3 n $2 n 


n 

V4n“I‘ 


Since s 2n approaches a limit as n it follows that S 3n + 2 °. How does one show that 
S n ^+<*>? 


19.4 / TESTS FOR ABSOLUTE CONVERGENCE 

In a very large number of important cases the most convenient test for absolute 
convergence is the following one, which is known as d’Alembert’s ratio test 
(after J. le R. d’Alembert, 17177-1783). 


THEOREM XIII. Let Hu n be a series with all its terms different from zero. Then 
the series is absolutely convergent if there is a positive number r < 1 such 
u n +i 


that 


<r for sufficiently large values of n. In particular , this condition is 


satisfied if the limit 


t = lim 

n~» 00 


U n + 1 


U n 


exists and t < 1. But if t > 1 the series is divergent . 


Proof. The assertions regarding absolute convergence are direct applications of 
Theorems VI and VII (§19.22) to the series 2 |u n |. If t > 1 the general term u n cannot 
approach zero, and the series 2 u n cannot be convergent. 


The theorem makes no assertion about what happens if t = 1. The series may 
then converge absolutely, or conditionally, or it may diverge. 

Theorem XIII is very useful in dealing with power series, as the following 
example shows. 


Example 1. The series 


1 - \x + 


1 -3 

2 • 4 


x - 


1- 3-5 

2- 4-6 


* 3 + • • • + (-!)" 


1 • 3 • • • (2n - 1) 


2-4 


2 n 


X n T • • 


(19.4-1) 


is absolutely convergent if \x \ < 1, and divergent if |x| > 1. 
Here we have (calling the first term u 0 ) 


m» = (-!)" 


1 • 3 - - • (2n — 1) w 
2 • 4 • • • 2n * 


u n+ 1 = (— 1) 


n + l 


1 -3 


2-4- 


(2n + 1) n+1 
(2 n + 2)* ’ 
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UfX + \ 

2n + 1. . 
= „ _ X , 

lim 

Un+l 

U n 

2n + 2' 

n-*o c 

Un 


= X . 


The assertions about (19.4-1) are now seen to follow by application of Theorem 
XIII (with t = |x|). For the present we avoid a discussion of the behavior of the 
series (19.4-1) when \x\ = 1 (but see Example 3). 

Example 2, If m is any number except 0 or a positive integer, the series 


1 + m x + m(m =L l ) x2 + 


m(m - 1) • • • (m - n + 1) 


nl 


x n + 


(19.4-2) 


is absolutely convergent if |jc|'< 1 and divergent if |x| > 1. 

The ratio is readily found to be 
u n 


m(m - 1) • • • (m - n) 


n + 1 


m - n 


m(m - 1) • • • (m - n + l)(n + 1)! x n n + l 


x, 


so that 


lim 


Un + i 


U n 


= lim 


m - n 


n + l 


x = x 


Application of Theorem XIII gives the result as asserted. 

Un + l 


For cases in which lim 

n-»<x 


U n 


= 1 there is a convenient test which is some- 


times effective. It was first established by L. J. Raabe in 1832, and is known as 
Raabe’s test. The essential idea of the test is to use Theorem V (§19.22), taking 
one of the series in that theorem to be the series 


± + ± + . 

l p 2 P 


+-U 


which is known to converge if p > 1 and diverge if pSl. Let a n = n p . Then 
On+I _ . ( n + i y p = / 1\ p 

a„ (n + l) p V n ) V n) ' 

Hence, by Theorem V, we have the following criterion. If p > 1 and if 

u n I \ n) 


(19.4-3) 


for all sufficiently large values of n, the series Su„ is absolutely convergent This 
criterion in itself is not very useful, however, for it is not easy in practice to tell 
whether the inequality (19.4-3) is satisfied. To improve matters let us proceed as 
follows: We set /(x) = (l + x)~ p and expand f(x) in powers of x by Taylor’s 
formula with remainder. We use (4.3-7) with a = 0, h = x, n = 1. The result is 
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This is valid if x > — 1. We put x = 1 In. Thus 



A„ 


PiP + 1 ) 

^!T 


The exact form of A n is unimportant. The only essential thing is that A n is 
bounded as n -»oo. Now (19.4-3) can be put in the form 


we rewrite this as 



(19.4-4) 


Now when p > 1 and n is very large, the expression p - ( Ajn ) is greater than 1. 
The form of the inequality now suggests that we consider the limit 


t = lim n 

n^oc 



Un+l 

U n 


)• 


(19.4-5) 


provided this limit exists. 


THEOREM XIV. (RAABE’S TEST.) Suppose that the limit (19.4-5) exists (either as 
a finite limit or as +°o or — <»). Then the series 2 u„ is absolutely convergent if 
t > 1, but not if t < 1. 


Proof. Suppose t > 1, and choose p so that \<p<t (here t may be +<»). 
Then certainly we shall have 

'HHifl) 

for all sufficiently large values of n. Consequently the inequality (19.4-4) must 
hold, since A n > 0. This is equivalent to (19.4-3), and therefore the series Su n is 
absolutely convergent. To show that the series does not converge absolutely if 
t < 1 it suffices to show that if we take p such that t < p <1, then 



for all sufficiently large values of n. For, by Theorem V, the convergence of 
2|u„| would imply the convergence of £n -p , and the latter series is divergent. 
We omit the details of the demonstration, which are very similar to the details of 
the first part of the proof, with all the inequalities reversed. 

Raabe’s test is indecisive if t = 1 (see Exercise 9). 

Example 3. Consider the series of Example 1 when x - ± 1. In that case 

u n+ 1 _ 2n + 1 f I u n +\ I ) n 

u n 2n+2’ l I U„ |J 2n + 2' 
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so that the limit t in (19.4-5) is Therefore, by Theorem XIV, the series (19.4-1) 
is not absolutely convergent when |x| = 1. This shows that it actually diverges 
when x = -1, for the terms are all positive in that case. When x = 1, however, 
the terms of the series are alternating in sign, and we can show that the series is 
conditionally convergent. The series is 



1*3 

2*4 


1*3*5 

2*4*6 


l*3***(2n-l) 
y } 2 • 4 * • • 2 n 


+ * * * . 


It is clear that the terms steadily decrease in magnitude as n -» °o, so that by 
Theorem XII all we need to show is that 


lim 

n-Kc 


1 * 3 * * * (2n 


2 • 4 • • • 2 n 


— = 0 


(19.4-6) 


Let 


Then 


1 • 3 * • (2n - 1) 
2 • 4 * * ■ 2n 


2 * 4 • • • In = 1 1 

C " 3-5-(2n + l) c„ 2n + 1 ’ 

and therefore 


C 2 n < 


2n + 1 


c„ < 


V2n + l’ 


the truth of (19.4-6) is now evident. 

There is a somewhat more powerful test than Raabe’s test. It is essentially a 
test for series of positive terms, and therefore may be used to test for absolute 
convergence by applying the test to 2 |h„|. The test reads as follows: 


THEOREM XV. Suppose that 


u n + 1 
U n 


can be expressed in the form 


U n + l 
U n 

where q > 1 and the sequence {A n } is bounded. Then the series 2 u n is 
absolutely convergent if p > 1, and not absolutely convergent ( either diver- 
gent or conditionally convergent) if p ^ 1. 


n n q 


(19.4-7) 


The test in Theorem XV is due to Karl Friedrich Gauss (1777-1855), one of 
the greatest mathematicians in history. We leave the proof to the student, with 
certain guiding suggestions; see Exercise 10. 

Example 4. Consider the series (19.4-2) when |x| = 1. As before, assume that 
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m is not zero or a positive integer. Then 


U n +l 

m - n 

U n 

n + 1 


For sufficiently large values of n we shall have \m - n| = n - m, and hence 



so that (19.4-7) holds with p = m + 1, q = 2, A n = ^ ^°^ ows ^ at 

series (19.4-2) is absolutely convergent when |x| = 1 if and only if m + 1 > 1, i.e., 
if and only if m > 0. 


There is a test due to Cauchy which, although not as easy to apply in many 
cases as d’Alembert’s ratio test (Theorem XIII), is more powerful (i.e., has wider 
theoretical applicability). It has important applications in the theory of power 
series. 


THEOREM XVI. (CAUCHY’S ROOT TEST.) The series 2 u n is absolutely con- 
vergent if there is a positive number r < 1 such that |w„| 1/n ^ rfor all sufficiently 
large values of n. This condition will be satisfied if lim n _» ro |w n | 1M exists and is less 
than unity. But , if |w n | 1/n ^ 1 for an infinite number of values of n, the series 2 u n 
is divergent. 


Proof. In the first case we have |w n |^r\ and the absolute convergence 
follows by comparison with the convergent geometric series 2 r n . In the second 
case u n cannot approach zero as n (since |u„| ^ 1 for infinitely many values 
of n). Hence the series is divergent. 


Example 5. The series 


so *- m 


is absolutely convergent if |jc| < 1, and divergent if |x| ^ 1. For, 

|«„| 1/n = (l + sin^y|l;§|jc| 


for all n, and 

kl^M 

if n = 1, 5, 9, 13, ... . The statements about the series now follow immediately 
from Theorem XVI. 

Cauchy’s test is indecisive if kP /n as we see by applying it to the series 
2 n~ p with p = 1 and 2, successively. 
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EXERCISES 

1. Apply d’Alembert’s ratio test to each series and state explicitly the conclusions 
which you draw from use of the test. In parts (a)-(e) make a separate investigation and 
decide whether the series converges or diverges for each of the values of x for which the 
ratio test is indecisive. 


w 2 -p- 

n = itl 

V n 1 • 

(b) „?,n 2 (n+ 2) x 


(e) 2 px". 

n = I fl 

(f) 2 


nl 


& i3 • 5 • • * {In + 1) 


x . 


(0 2£- 

n— lX 


(g) 2 ^=ix". 

n = t Z 


(d) 2 


3 ■ 5 • • (2n + 1) 
5 • 10 • • 5n ' 


(h) 2 


(2 n) n 


x . 


„^i(n + l) n+1 

2. Apply Cauchy’s root test to each series and state explicitly the conclusions which 
you draw from use of the test. 


<*> 

oc 

(b) 2 P >0. 


(c) 2 


(e) 5 + 5+(3) 2 +(l) 2 + (5) 3 + (5) 3 + ". 

(f) iX + i ■ ix 2 + (I) 2 • lx 3 + & 2 (I)V + (1) 3 (1)V + d) 3 d)V + • • • . 

(g) f 


(d) 2 


2 (log n) n 

1 


(h) 2( 


3 + 2co S f)'g)’. 


„e 2 (log n) lo8n 

3. Apply Raabe’s test to each series and state explicitly the conclusions which you 
draw from use of the test. In (e) and (f) use the formula for (l + x) -p developed for the 
proof of Raabe’s test. 


(a) 2 

n = 

(b) f 


2 • 4 


2n 


n = i3 • 5 • • • (2n + 1) 
2 • 4 • * ■ 2n 


(c) 2 


i5 -7 
5-6 


(2n + 3) 
(4+n) 




^,8-9 

4-5 


(d) Sr 


(7+ n ) (n + 1) - 

(3 + n) 5 • 6 • 


(g) 2 


1 • 3 • • • (2n - 1) 3 ■ 7 • ■ • (3n - 1) 
2 ■ 4 * • • (2n ) 5 ■ 9 * • • (4n + 1) 


(4+n) 


^,8 -9 - (7 + n) nl 

4. Test each series in Exercise 3 by Gauss’s test. Observe that Gauss’s test is more 
effective than Raabe’s test in parts (e), (f), (g). 

5. Let 0 < a < b < 1. Show by Cauchy’s root test that the series 

a + b + a 2 +b 2 +a 3 +b 3 +- ■ • 

is convergent. Attempt to test the series by d’Alembert’s ratio test and state clearly the 
outcome of your attempt. 

6. Consider the series 

a + ab + a 2 b + a 2 b z + a 3 b 2 + ■ * • , 


in which each term is derived from its predecessor by multiplying alternately by b and by 
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a. Show that, if 0<a<l and 0<b<l, the series may be proved convergent by 
d’Alembert’s ratio test. Assuming a and b to be positive, prove that the series is 
convergent if ab < 1 and divergent if ab ^ 1. 

7. Assume that none of the numbers a, b, c is a negative integer or zero. Prove that 
the series 


ab a(a + 1 )b(b + 1) a(a + 1 )(a + 2 )b(b + l)(b + 2) 
c 2!c(c + 1) 3!c(c + l)(c + 2) 

is absolutely convergent if c > a + b and divergent if c ^ a + b. Why is conditional 
convergence not possible? 

8. (a) Suppose a n > 0 and t = lim — Show that the series 2 a n is convergent if 

t > 1 and divergent if t < 1. Suggestion: Show that t > 1 implies a n < n~ p if 1 < p < t 
and n is sufficiently large. 

(b) Apply the test of part (a) to each of the series 

f 1 f 1 

»-A Vn/ ’ „= 2 (logn) ,ogn ’ n = 2 (log n) ,og(,ogn) 

9. For each of the series 


f [~ 1 • 3 • • • (2n - 1) 1 2 f 1 
n 4iL 2 • 4 • • * 2n J’ rf^ 2 n(logn) 2 

show that t = 1 in Raabe’s test. Show by other methods that the first series is divergent 
and the second is convergent. This shows that Raabe’s test is indecisive when t = 1. 

10. Prove Theorem XV with the aid of the following suggestions. If 1, the 
conclusions are easily drawn with the aid of Raabe’s test. Write out the argument 
explicitly. For the case p = 1, let 


= 1 

"" (n - l)log(n - 1) ; 

use Taylor’s formula with remainder to show that 


log 



I_^!L 
n n 2 ’ 


where A n is bounded as n -> oo. Then show that 

V n + 1 ,1 1 Bn 

= 1 : 2 9 

v n n n log n n 

where B n is bounded as n -><». From this show that 


Un + 1 

u n 


Vn+l 

V n 


is positive when n is sufficiently large. Now complete the proof of Theorem XV. 
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19.5 / THE BINOMINAL SERIES 

Let us consider the Taylor’s series expansion in powers of x of the function 
f(x) = (1 + x ) m , where m is any real number. We have 

/(x) = (l + x)" 

/'(x) = m(l + xr 1 
/"(*) = m(m - 1)(1 + x) m ~ 2 

f (k) (x) = m(m - 1) ■ • • (m - k + 1)(1 + x) m ~ k 


Thus, /( 0) = 1 , and for k = 1 , 2, • * • 

/ (k) (0) = m(m - l)(m - 2) • • • (m - k + 1). 

Taylor’s formula with remainder is 

(1 + xr = 1 + rnx + ^x 2 + • • • + + Rn+h 


We shall use Cauchy’s formula for R n+l (Theorem IV, §4.3). From (4.3-14) with 
a — 0 and h = x we have 

JR„+i = l —^~ 1 x n+ \\ - or , 0 < 0 < 1. 

If m happens to be a positive integer or zero we see that f (k \x) = 0 when k> m. 
Hence in this case R m+] = 0 and we have 


(1 + x) m = 1 + rnx + 


m(m- 1) 2 m(m-l)(m-2) 3 

2! 3! 


• + x m . 


This is just the ordinary binomial-expansion formula. 

In the rest of this section we shall assume that m is not a positive integer or 
zero. The formula for R n+] becomes 


Rt i+i - 


- m ( m ~ 1) • • • (m - n)(l + 6x) T 


x n+ K\-S ) n , 


nl 


n _ m(m - 1) • • • (m - n)( 1 - 0 
Rn+ '~ nl 


(T+h)" {l+ex)m ~' xn+ '- (19 - 5_2) 


Now suppose that -1 <x < 1. Then 


0< irs <l 

If m > 1 we have 0 <(1 + 0x) m ' 1 < (1 + |x|) m “\ while if m < 1 we have 

1 1 


(i + ex) m ^’ = 


< 


(l + flx ) 1 '” 1 (i-M) 


L l-m 9 
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so that 0 <(1 + 0x) m ! <(l-|x|) m \ Thus, if - 1 < jc < 1, 


I Rr 


m(m - 1) • • ■ (m - n) 


a±|jcir- i i*l 


|n + l 


(19.5-3) 


the choice of the double sign in (1 ± |x|) m_1 depending on the sign of m — 1. From 
(19.5-3) we shall show that R n+ i ^> 0 as n-*o o if |x|<l. Since (l±|x|) m_1 is 
independent of n, it is sufficient to show that 


Now let 


lim m(m-l);(m — ») x „ +l = 0 

n-wc til 


„ _ m(m - 1) ■ • • (m -n) + , 

u n - n , x . 


(19.5-4) 


and consider the series 2w„. This series is convergent if |x|<l, as we see by 
applying the ratio test of Theorem XIII, for 




m - n + 1 

U n 


n + 1 


|x| — > |jc| as n -»<». 


Since the series converges, it follows that w n -»0 as n->co; thus (19.5-4) is 
established. 

We have thus proved the validity of the series expansion 


(l + x) m = l + mx + 


m(m 


2 ! 


1) 2 


+ 


m(m - 1) * • * (m - n + 1) 


ot" + 


(19.5-5) 


when |x|<l. This is called the binomial series. Except when m is a positive 
integer of zero it is a nonterminating series, none of the coefficients vanishing. 
For a discussion of the validity of (19,5-5) when x = ± 1, see the Exercises. The 
student will note that the series (19.5-5) is the same as the series (19.4-2), which 
was discussed in Examples 2 and 4 of §19.4. Observe, however, that there is a 
logical difference between proving merely that the series converges and proving 
that its sum is equal to (1 + x) m . 


EXERCISES 

The purpose of this set of exercises is to guide the student in completing the 
discussion of the binomial series when x = ±1. Certain results of a general nature are 
needed, and these are taken up first. 

1. Suppose u n > 0 and 

+ l | P j_ -An 

u n n n q 7 

where p > 0, q > 1, and A n is bounded as it -» <». Show that, if 0<r <p, u n satisfies an 
inequality of the form u n ^ Cn -r , where C is a constant, for all sufficiently large values of 
it. As a consequence, u n ^0 as n^oo. Suggestion: Let v n = n~ r and use the formula 
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(developed in §19.4) 


Vn + l j _j_ B n 

v n n n 2 ’ 

where B n is bounded as n to show that 

Vn + 1 Mw+1 

V n U n 

has the same sign as p - r when n is sufficiently large. Then, by an argument like that in 
the proof of Theorem V, draw the desired conclusion. 

2. Let 


_ a(a + 1) ■ • • (a + n - 1 )b(b + 1) • • • (b + n - 1) 

Un n !c(c + 1) • • ■ (c + n - 1) 

where none of the numbers a, b, c is zero or a negative integer. Suppose that 
1 + c - (a + b) > 0. Show that u n -» 0 as n -> oo. Use the result of Exercise 1. 

3. Consider the binomial series for x = - 1. Assume throughout that m is not zero or 
a positive integer, (a) Explain why the terms of the series are all of the same sign for 
sufficiently large values of n , and show that the series converges if m > 0, but diverges if 
m <0. Note Example 4, §19.4. Actually this is a special case of Exercise 7, §19.4. (b) 
Follow the suggestions given and so prove that the binomial expansion (19.5-5) is valid if 
m >0 and x = — 1. Start with Taylor’s formula with integral remainder (Theorem II, §4.2), 
taking f(x ) = (1 + x) m , a = 0, x = - 1. In this way one finds 


(l-l) m = 


. m (m - 1) 

1 ~ m +_ 2! — ^ + 


, m(m - 1) • ■ • (m - n + 1), „ 

H — (-1J + K n + 1, 


Rn 1 = jy,+i . (m - l)(m - 2) • • - (m -n) 


n : 


Then 


Rn + \ 


(- m + 1)(— m + 2) ■ • ■(- m + n) 
n\ 


Now use the result of Exercise 2 to show that \\m n ^R n +i — 0. This proves that if x = -1 
the binomial series is convergent, with sum 0, when m > 0. 

4. Consider the binomial series for x = 1. 

(a) Show that, for sufficiently large values of n, the terms of the series alternate in sign. Show 
also that the series is divergent if m ^ - 1 and convergent if m > - 1 . For this last part use the 
result of Exercise 2. The general term is 

m(m - 1) • • • (m - n + 1) 

Un ~ n\ 

(b) Show that the binomial expansion (19.5-5) is valid if x = 1 and m > - 1. Start with 
Taylor’s formula as in Exercise 3(b), this time putting x = 1, and getting 

(1 + l) m = Mo + U 1 + ’ * ' + U n + R n + 1, 

K+i _ m(m — l)---(m- g )f' (1 _ t) „ (1 + dt 

n ! Jo 

If n > m — 1 the value of the integral is less than 1 l(n + 1). Explain why this is so. 



600 


INFINITE SERIES 


Ch. 19 


Thus 


|jRn+l| 


< 


m(m - 1) ■ • • (m - n) 
(n + 1)! 


What is the rest of the argument? 


K+i|. 


19.6 / MULTIPLICATION OF SERIES 

We have already noted, at the beginning of §19.31, that term-by-term addition (or 
subtraction) of convergent series is a legitimate operation. This is convenient for 
obtaining new series expansions from expansions already established. 

Example 1. Show that, if |x| < 1, 

5l°gY^=* + V + k 5 +---. (19.6-1) 

We get this series expansion by observing that 

log = log (1 + x) - log (1 - x) 

and using (19.1-1). From this latter formula we have 

log(l + x) = x - \x 2 + lx 3 - ix 4 + X 5 X 5 

if -1 <x ^ 1. Consequently, replacing jc by -jc, we have 

log (1 x) = -X - - lx 3 - be 4 - 5 X 5 

if -1 = x < 1. Combining by subtraction, which is valid if — 1 <x < 1, since both 
series are then convergent, we have 

log (1 + x) - log (1 - x) = 2x + 3X 3 + §x 5 + * • * . 

This is equivalent to (19.6-1). 

It is likewise convenient to be able to find new series expansions by 
multiplying known series expansions. For instance, suppose we wish to expand 

(^ T 7) ,,2 = (1+x),,2(1+x2r, ' 2 

in powers of x. We might proceed as follows: The expansion of (l + x) ,/2 is a 
particular case of (19.5-5); to a few terms it is 

(l + x) ,/2 = 1 + ^x-|x 2 + ^x 3 -T2 sX 4 + • • • . (19.6-2) 

To get the series for (1 + x 2 )" 1/2 we put x 2 in place of x in (19.5-5) and set m = -i 
This is legitimate if |x| <1, since then x 2 < 1 also. To a few terms the series is 

(1 + x 2 )- ,/2 = 1 - ix 2 + lx 4 - he 6 + • - . . (19.6-3) 

(l+x 5 ) 1 =( 1 + 2*-8 * 2 + i^c 3 )(l-^x 2 +ix 4 ). 


Thus 
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The next question is: How do we multiply the two series together to get a new 
series? Proceeding just as though the series were finite sums, we might write 
down the following scheme, which arises by multiplying the second series 
successively by each term of the first series: 

1 —\x 2 +§X 4 — 16X 6 +*** 

\x 

- be 2 +jex 4 -|pc 6 +-*- 

J_ v 3 J_ v 5 _i 

16* 32* + 


There will be an infinite number of rows, each row being an infinite series. But 
we observe that there are only a finite number of terms of each degree, so that if 
we collect together terms of like degree, we obtain for the first few terms 

l + Jx-!x 2 -&c 3 +---. (19.6-4) 

It is clear that there is here a systematic process, but it remains to prove that the 
process gives a series which has as its sum the product of the sums of the two 
original series. There is a general theorem which justifies the process. 


THEOREM XVII. Suppose that each of the series Zu n , lv n is absolutely 
convergent , with sums U and V respectively : 


U — Mo+ U\ + M 2 + * * * , (19.6—5) 

V = i> 0 + V[ + v 2 + • • * . (19.6-6) 


Let Wo = m 0 t) 0 , w i = u 0 v i + Mi Do, and in general 


= u 0 v n + U\v n -i + • • • + u n v 0 . (19.6-7) 

Then the series 2 w„ is absolutely convergent , and its sum is UV: 

UV — Wo+ W\ + W2+ • ■ • . (19.6-8) 

Moreover , any infinite series which has as its terms the products u t Vj (i and 
j ^ 0) arranged in any order , each product occurring once and only once , is 
absolutely convergent , with sum UV. 


Proof. Let us consider the array 


M 0 Uo H 0 Ul 
UiVo U t Vi 


UoVn • ' * 

UiV n * * * 


(I) 


U n v 0 U n v 1 • • * U n v„ 
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and the similar array in which u t Vj is replaced by |u,-| |u/|. Let us denote the 
second array by (II). Finally, let 

A = |uo| + I Mil + ■ • ■ + |w„| + ■ • • 

B = |©o| + H + * * * + M + • • •. 

These last series are convergent because of the assumption that (19.6-5) and 
(19.6-6) are absolutely convergent. Now consider any series formed from the 
terms of the array (II), taken in some definite order. Such a series is convergent, 
since each of its partial sums is less than or equal to AB. For, any such partial 
sum is less than 

(|m 0 | + |Mi| + • • • + |M n |)(|uo| + ||?i| + * * * + |t>n|) ' 

if n is taken sufficiently large, and this last product is certainly no larger than 
AB, since the first factor does not exceed A and the second factor does not 
exceed B. It follows that any infinite series formed by taking the terms of the 
array (I) in some definite order is absolutely convergent. Since, of any two such 
series, one is merely a rearrangement of the other, Theorem XI (§19.31) assures 
us that all such series have the same sum. 

Now one possible arrangement is that in which we take w 0 ro first, then Uov u 
U\v u and U\v 0 ; then u 0 v 2 , U\v 2 , u 2 v 2 , u 2 v\ , and u 2 v 0 ; and at the nth stage all terms UiVj, 
for which i and j do not exceed n and at least one of them is equal to n. At the nth 
stage our partial sum is exactly 

(n 0 + Ui+ ■ ■ ■ + u„)(i;o+ «i+ • * • + v n ), 

which is the sum of all the terms in an upper left square portion of the array (I). 
Hence, by (19.6-5) and (19.6-6) the limit of the partial sums is 17 V, and this must 
be the value of the series (by the fact that the limit of a product is the product of 
the limits). 

Another arrangement is that in which we take first u 0 v 0 , then u 0 V\ and u ^ 0 , 
then u 0 v 2 , UjVi, and u 2 v 0 , and at the nthe stage all terms UiVj for which i + j = n. 
This arrangement gives the same sum UV, of course. But if we group the terms 
according to the stage, we get 

U 0 V 0 + UiV 0 )+ (u 0 V 2 + UiVt+ U 2 Vo) + ‘ ‘ * , 

which is exactly the series (19.6-8). The insertion of the parentheses technically 
changes the series, of course, since the last series has the sum of all products in 
one parenthesis as a single term, whereas previously each product UjVj was a 
single term. But the insertion of parentheses into a convergent series always 
leads to a convergent series whose sum is the same as that of the original series 
(see Exercise 7). Therefore (19.6-8) holds, and the proof of the theorem is 
complete. 

As an application of the theorem let us return to (19.6-4). This series was 
obtained by multiplication of the series (19.6-2) and (19.6-3) by the rule of 
Theorem XVII. Since each of the two latter series is absolutely convergent if 
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|*| < 1, the series (19.6-4) is absolutely convergent if |jt| < 1, and its sum is the 
product of the sums of the other two series, namely (1 + x) m (\ + x 2 )~ m . 

Observe that, if the series 

a 0 + a\X + a 2 x 2 + • • • + ape 1 + • • * , 
b 0 + b \x 4- b 2 x 2 + • • • + bjX j + • * • 

are multiplied according to Theorem XVII, the resulting series is 
Uo^o+ (flobi + a\b 0 )x + (a 0 b 2 + flibi+ a 2 b 0 )x 2 + * ■ • , 
the coefficient of jc" in the last series being 

a 0 b n + aib n _i+ • • • + a n b 0 . 


To justify this application of Theorem XVII in any given case we need merely to 
check on the absolute convergence of the first two series. 


Example 2. Find the series expansion of - ~ - log P owers of x. 


We know from (19-3) that 

4— = l + x+x 2 +--- = 2*" 

1 * n=0 

if \x\ < 1; also, from (19.1-1), 


(19.6-9) 


r 2 r 3 0:1 r n 

log(l-x) = -X-y- T ; =- 2 — 


if - 1 ^ x < 1, so that 


1 06 v n 

log = 2 t-- 

I * n= i n 


(19.6-10) 


Each of these series is absolutely convergent if |x| < 1, as may be verified by 
Theorem XIII. The coefficients in (19.6-9) are a n - 1, n = 0, 1,2,...; those in 

(19.6-10) are b 0 = 0, b n = n = 1, 2, . . . . Therefore a 0 bo = 0, and if n ^ 1, 


a 0 b n + ajbn-i+ * 


+ a n b o = — + 
n 


1 

n - 1 


+ 


* 


1. 


Thus 


^I log r^ = l,( 1 + i + "' + n) X " 

= * + (1 + 2)* 2 + (1 + 2 + 3 )* ? + 


This series is absolutely convergent if \x\ < 1. 
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EXERCISES 

1. Multiply the series for (1 — x) _1 (series (19-3), valid when |x| < 1) by itself and so 
obtain the expansion 


(1-x) 2 = 2) (n + l)x", |*| <1. 

n=0 


2. Derive the expansion 


(?L±iK«±2) x n 

n=0 2. 


(M < 1) 


by multiplication of series. Use the result of Exercise 1. 

3. Obtain the expansion, valid when 0<x < 1, 

~ = 1 + (1 — 3)X + (1 “ 3 + 5)JC 2 + ( 1 “ 5 + 3 — £)X 3 + ■ ' * . 

1 - x Vx 

4. Prove by multiplication of series that if 


/(*) = 2 77, then /(*)/( y) = /(* + y) 

for arbitrary x and y. 

5. Prove that 


2 2(-iy 


. 2m + 1 


(2m + 1)! ^0 


• 2>l) n 


(2n)! 


i(-ir 

p=0 


(2x) 2p+1 

(2p + 1)! 


6. Using no other properties of sin x and cos* than the fact that the series 
representations (19.1-8) and (19.1-9) are valid, prove that sin 2 * + cos 2 x = 1. 

7. Let 2«„ be a convergent series with sum s. Let sets of parentheses be introduced 
into the series, e.g.: 


(Ml + u 2 ) + (u 3 + u 4 + u 5 ) + (m 6 ) + (u 7 + Us) + ■ • • . ■ 

Let the sum of the terms in the nth parenthesis be v n . Prove that the series is 
convergent and that its sum is s. 

8. The rule for multiplying two series, as given in Theorem XVII, may give a wrong 
result i f the series are not absolutely convergent. Verify this by taking u n = v n = 
(-l)7Vn + 1 in the theorem and showing that w„ does not approach zero as n -> 00. 
Suggestion: Show that, for k = 0, 1, . . . , n, (k + l)(n - k + 1) ^ l(n + 2) 2 ; then show that 
|w„| ^ 2(n + l)/(n + 2). 


19.7/ DIRICHLET S TEST 

Thus far all our tests, with one exception, have been tests for series of positive 
terms, or tests for absolute convergence. The one exception is the alternating 
series test of §19.32. We shall now discuss a more general test which is useful on 
occasion for proving that a series is convergent, though not necessarily ab- 
solutely convergent. Most tests of the character of this one depend upon an 
algebraic device known as summation by parts . The device is analogous to 
integration by parts. 
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Let a 0 , a h . . . and b 0 , b h • . . be two arbitrary sequences of numbers. Let 

S n = a 0 + Ul + * * * + Un- 

Then 

a 0 &o+aib,+ • • * + aA= s 0 (bo-bi)+ si(bi - b 2 ) + • * • + s n -i(b n . A - b n ) + s n b n . 

(19.7-1) 

This is the identity of summation by parts. It is known as Abel's summation 
identity. Niels H. Abel was a Norwegian mathematician (1802-1829). The proof 
of the formula is very simple. We observe that a n = s n - s n - 1 if n ^ 1 , and a 0 = s 0 . 
Therefore 


flobo = Sobo 

a\b i = Sjb i — sob] 

a 2 b 2 = s 2 b 2 Sjb 2 


Un-\b n -\ S n -\b n ~i S n -2b n -\ 

Unbn Snbn Sn-lbn- 

Adding these results and grouping the terms on the right appropriately, we 
obtain (19.7-1). 

We can now state a test known by the name of P. G. Lejeune Dirichlet, a 
German mathematician of the first half of the nineteenth century. 

THEOREM XVIII. (DIRICHLET’S TEST.) Consider a series of the form 

a 0 b 0 + aibi + a 2 b 2 + • • • + a n b n + • • * (19.7-2) 

which satisfies the following conditions: 

(a) the terms b n are positive , b n + 1 = b n , and b„ ->0 as n 

(b) there is some constant M independent of n such that 

|a 0 + a 1 + • • * + a n \ % M for all values of n. 

Then the series (19.7-2) is convergent. 

Proof. Let 


S„ = aobo + * * * + a n b n . 

We have to show that lim„^oe S n exists. Now by (19.7-1) we have 

S n = T n + s n b n 


(19.7-3) 


where 


Tn = So(bo— b|) + • • • + Sn-l(b n -l ~ b„). 


Now \s n b n \ ^Mb n , by condition (b), and so 5„b„->0, since b n ->0. Next we show 
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that lim^ T n exists. The desired conclusion will then follow from (19.7-3). Now 
T n is the partial sum of the series 

s 0 (bo-b l ) + s l (b l -b 2 )+--‘, (19.7-4) 

and certainly T n will approach a limit if we show that the series (19.7^1) is 
absolutely convergent. Now 

|^o(£>o — bi)| + |si(i>i- b 2 )\+ ■ ■ ■ + |5„_i(b„_j — b„) I 
=3 M(b 0 - f>,) + M(b, - b 2 ) + ■ • • + M(b„_, - b„) = M(b „ - b n ), 


'ZW(bk-b k+i )\tzMb <) . 

k= 0 

Here we have used the conditions (a) and (b). Since these sums are bounded, we 
conclude that the series (19.7-4) is absolutely convergent. This is all that was 
needed to complete the proof. 


Theorem XVIII is useful in connection with trigonometric series. 

Example . The series 

cos* cos 3x cos 5x q c . 

1 ' 3 5 ' • 3) 

is convergent if x is not one of the values 0, ±7 r, ± 2i r, . . . , 

To prove this we apply Theorem XVIII, taking a 0 = 0, u, = cosx, a 2 = 
cos 3x, . . . , a n = cos(2 n - l)x, n = 1, 2, . . . , and b 0 = 1, b„ = (2n - l) _l , n - 1, 
2, . . . . Condition (a) is clearly satisfied, so it remains only to show that condition 
(b) is satisfied. This requires a bit of ingenuity. We use the trigonometric identity 

2 cos A sin B = sin(A + B) - sin(A - B). 

Taking B = x and A successively equal to x, 3x, 5x, . . . , we have 


2 cos x sin x = sin 2x - 0, 

2 cos 3x sin x = sin 4x - sin 2x, 
2 cos 5x sin x = sin 6x - sin 4x, 
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If now x is not one of the values 0, ±tt, ±2tt, . . . , then sin x¥= 0, and 

1 


|cos x + cos 3 x + • • ■ + cos(2n - l)x| ^ 


2|sin x\ 


Hence condition ( b ) of Theorem XVIII is satisfied with M = 
the series (19.7-5) is convergent. 


2|sin x 


• Therefore 


EXERCISES 

1. Prove that the series 


sin x + sin 3x + sin 5x + 


1 V3 V5 
is convergent for all values of x. Use the identity 

2 sin A sin B = cos(A - B) - cos(A + B). 

2. Prove the identities 

2 sin ^(sin x + sin 2x + • • ■ + sin nx) - cos ^ - cos — ^ — x > 

~ . x , , - , , x .2n+l .x 

2 sin 2 (cos x 4- cos 2x + • ■ * + cos nx) = sin — ^ — x - sin 


3. Suppose that a n > 0, a n +\ a n ->0asn^ °°. Prove that the series a n sin nx 
is convergent for all values of x, and that 2*=ia„ cos nx is convergent with the possible 
exception of the cases in which x = 2 77m, m = 0, ±1, ±2,... . Use the identities in 
Exercise 2. 

4. Suppose that S^=ia„/n p is convergent. Show that 2*=ia„/n q is convergent if 
p<q. 

5. Deduce Theorem XII as a special case of Dirichlet’s test. 

6. The following theorem is known as Abel’s test: Suppose 2 a n is convergent, and 
that b n >0, b n +i = b n - Then Xa n h„ is convergent. Prove this. 

7. Show that the series 

— 1 — + — ! 1 I I 

2 log 2 3 log 3 4 log 4 5 log 5 6Iog6 71og7 

is convergent, the rule of signs being that successive terms with the same sign come in 
groups of 2, 4, 8, 16, ... . Begin by considering the series 


8. Show that the series 

1 — 5 “ 5 + 3+5 — 6 — 7+"l ‘ 

is convergent. Successive terms of like sign come in groups of two. 


MISCELLANEOUS EXERCISES 

1. For what values of x is Sn x x M convergent? 

2. Find the sum of the first n terms of the series i log[l + (1/n)]. Is the series 
convergent or divergent? 
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of c. 


3. Show that 2(l/n) logfl + (\ln)] is convergent. 

4. Show by Theorem III, or otherwise, that 2^= 2 l/(log ti) c is divergent for all values 

5. Express (log rt) ,oen as a power of n, and use the result to show that 2 ( j ^g » iS 


convergent. 

6. Examine each of the following series for convergence or divergence. 


(a) 2(-»" 


n 


(b) 2 jn-R 


(n + 1)" +1 
(w + 1)" 
n' 


(c) 




(d) 2 


fl + O/n)]' 


(a) 


7. Show that 
2 • 4 ■ • ■ 2n 


< 


3 * 5 • • ■ (2n + 1) Vn + 1 
(b) Classify the values of x according to whether the series 

n ! „ 


3-5 


(2 n + 1)‘ 


is convergent or divergent. 


8. Find all values of x for which the series 2 


1 - 3 - ■ • (2n — 1) 1 ■ 


is convergent. 


2 • 4 ■ • ■ 2 n n 

9 . (a) Is 2 sin 7r[n +(l/n)] absolutely convergent? Is it convergent? 

(b) Show that 2 sin 2 7r[n + (1/n)] is convergent. 

(c) For what values of B is 2(-l)"(l/n) cos (Bln) convergent? 

(d) Is 2[1 - cos(it/h)] convergent or divergent? 

10. Discuss the convergence of each series, classifying the values of x into those for 
which the series converges and those for which it diverges. 


, x V r iy* 1 ‘ ^ ‘ ' (2n — 1) 2 /x\ 4 " +1 

(a) „4, (_1) 2 ■ 4 ■ ■ ~2n ~ 4n+~\\ 2.) ' 


V ?2n-2 (( W ~ 1)Q 2 2n 

W n 4 2 Z (2n)! • 


(c) 2 


(n + D f 
n! 


11. Show that 2 is convergent if a > e and divergent if 0 < a ^ e. 

12. (a) If x„ = 1-5 + 3 • — l/(2n), show that x„-4log 2 as by using the 

definition of Euler’s constant (see Exercise 6, §19.21). Hint: Show that x„ = 
C 2 „ - C„ + log 2 in the notation of the exercise just mentioned. 

(b) Prove that the partial sums of the series 


l+5~3 + 5+5 — 6+"l * * 


approach +«> as Make use of Euler’s constant. 

13. Prove that, if u„ > 0 and 2 u n is convergent, so is 2 ul. 

14. Show that the series 21og(n sin(l/n» is convergent. 
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15 . Consider the series 


1 I i4.i_1_I4.i4_I i 

A 2 3+4+5+6+78* 


the rule of signs being that successive terms with the same sign come in groups of 1,2, 4, 
8, 16, ... . Show that this series does not satisfy the condition of Theorem VIII, and is 
therefore divergent. Show also that, nevertheless, the sum s n of the first n terms of the 
series satisfies the condition 0 < s n ^ 1. 

16 . Consider the series 


I 1 i 


in which successive terms of like sign come in groups of 1, 2, 3, 4, ... . Show that this 
series is convergent. 

17 . Prove that the series S~= 2 (l In) cos(a logn) is divergent by showing that it does 
not satisfy the condition of Theorem VIII. Suggestion: Suppose 0<5 <tt/ 2. Let n 1 
denote the greatest integer exp[(27rm + 5)/a] and let n 2 denote the greatest integer 
^exp[(27rm + S)/a], where m is a fixed positive integer, and exp u = e u . Show that 
cos(a log n ) ^ cos 5 > 0 if iu < n ^ n 2 . Then show that 


cos 

n 1 


cos(a log n ) 


1. 


" 2 +1 cos 8 . 25 cos 5 

dx > 


Now note that n 1 and n 2 -»o°, if and so finish the proof. (Assume a > 0.) 

18 . Show by an argument similar to that of Exercise 17 that the series 

|.cos(log(logn)) js ^ergent. 
n =4 log n 



20 / UNIFORM 
CONVERGENCE 


20 / FUNCTIONS DEFINED BY CONVERGENT SEQUENCES 

In the more advanced parts of analysis we often deal with functions which are 
defined by means of infinite series. Suppose that 

/(*) = 2 «»(*)> (20-1) 

n = l 

where it is assumed that the terms of the series are each defined on the same 
interval a and that the series is convergent for each x of the interval. 

The sum of the series is then also a function of x , and we denote it by fix). The 
main purpose of this chapter is to deal with questions A-D below. 

Assuming that we know a good deal about the functions Mi(x), 
u 2 (x), . . . , but nothing about the function fix) except in so far as we can draw 
certain conclusions from the series (20-1), we raise four questions: 

A. What conditions will assure us that / is continuous? 

B. What conditions will assure us that / is differentiable? 

C. What conditions will justify us in writing 

f f(x)dx=f ui(x)dx+f u 2 (x)dx+‘‘-? (20-2) 

J a Ja Ja 

D. What conditions will justify us in writing 

/'(*) = u ',(*) + u 2 (x) + * • • ? (20-3) 

That is, when can we integrate and differentiate an infinite series just as though it 
were a finite sum of functions? 

The answers which we shall give to all these equations are dependent on the 
concept of uniform convergence. Before proceeding further, however, let us look 
at the problems raised by these questions in another way. Consider the partial 
sums 

s n (x) = Ui(x) + u z (x) + • • • + u n (x) (20-^0 

of the series (20-1). Then 


f(x) = lim s n ix); (20-5) 

n-»-oc 

that is, the function fix) is the limit of a sequence of functions. Now we may 
consider the notion of the limit of a sequence of functions quite apart from the 
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notion of a function defined by an infinite series, by presenting the sequence 
directly, rather than as a sum formed from the terms of a series. 


Example L Consider the sequence f n (x) = x", and define 


f(x) = lim/ n (x), 0 ^ x ^ 1. (20-6) 

n-»<* 

The sequence is convergent if 0 ^ x ^ 1, and we see 
that (20-6) is equivalent to the definition 

J7(x) = 0 if O^xCl, (20-7) 

1 /( 1 ) = 1 . 

The graphs of several of the functions f n (x) are shown 
in Fig. 176. 



Fig. 176. 


We shall find that, in framing our answers to the questions A-D, the 
important consideration is that of functions defined as the limits of sequences. For 
this reason the subsequent discussion in this chapter is often phrased in terms of 
sequences rather than in terms of series. Let us rephrase the questions as 
follows: 

Assuming that a function /(x) is known to us entirely by 


/(x) = lim /n(x), a^x^b, (20-8) 

n-»x 

we ask: 

A'. What conditions on the functions f n will assure us that / is continuous? 

B\ What conditions will assure us that f is differentiable? 

C'. Under what conditions will it be true that 

C f(x)dx = lim f b f n (x) dx? (20-9) 

J a n-»°c J a 

D'. Under what conditions will it be true that 

/'(x) = lim /;(x)? (20-10) 

n-»oc 

In the particular case when / n (x) = s„(x) and s n (x) is defined by (20-4), the 
questions A-D' are the same as the questions A-D. 

We do not aim to get answers to these questions in the form of necessary 
and sufficient conditions; the conditions which we shall impose will be sufficient 
but not necessary. 

Let us examine the situation in Example 1 with respect to question A'. Here 
we observe that each of the functions /„(x) = x" is continuous on the closed 
interval [0, 1]; nevertheless, the limit function f(x) is not continuous on the whole 
interval, but is discontinuous at the point x = 1. From this example we can con- 
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dude that in question A' the assumption that each of the functions f n is 
continuous on [a, b] is not sufficient to assure us that the limit function / is 
continuous on [a, b]. Something more is needed, not about the functions /„ 
themselves, but about the way in which /„(x) converges to /(x). 

Next we consider an example which will be instructive with respect to 
questions B' and D\ 

Example 2 . Let f n (x ) = x n /n, n = 1,2, ... . 

On the interval 0 ^ x ^ 1, 0 ^ f n (x) ^ 1/n, and hence 

lim f n (x) = 0, O^x ^ 1 . 

n->oo 

The limit function f(x) has the value zero for each x in the interval [0, 1]. Thus / 
is differentiable. But consider the sequence of derivatives /!,(*) = x" _1 . The 
successive members of this sequence are 1, x, x 2 , x 3 , . . . , and as we saw in 
Example 1, 

lim f'(x) = 0 if 0^x<l 


lim/Kl)= 1. 

n-*.oo 

Thus 

lim /Ml) */'(!). 


This example shows us that in question D' the relation (20-10) may fail to be true 
(at least for some values of x in the interval) even when the functions f„ and / 
are differentiable and the sequence of derivatives fh(x) is convergent for every x 
in the interval. 

Finally we give an example bearing on question C'. 

Example 3. Let us define a sequence / n (x) so that its graph is as indicated in 
Fig. 177. That is, the graph of y = /„(x) for 0 ^ x ^ 1 consists of three line 
segments: the line y = 4n 2 x from x = 0 to x = l/(2n), 
the line y - -4n 2 x + 4n from x = l/(2n) to x = 1/n, and 
the line y — 0 from x = 1/n to x = 1. The high point of 
the graph is at x = l/(2n), the height being 2 n. From the 
definition we see that, for each x in [0, 1], 

lim f n (x) = 0. 

For, / n (0) = 0 for all n, and if 0 < x ^ 1, f n (x ) = 0 as 
soon as n is large enough to make 1/n <x. Thus the 
limit function f(x) has the value zero for each x, and Fig. 177. 
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accordingly 

(' f(x) dx=0. 

J o 

But 

f f„(x) dx = 1 
Jo 

for each value of n, because the integral is just equal to the area of a triangle of 
base 1/n and height 2 n. Therefore 

f /(x)dx^ lim ( f n (x)dx. 

This shows that for the truth of (20-9) in question C' we must have something 
more than the mere convergence of f n (x) to /(x), even when the functions /„ and 
/ are all continuous. 

In concluding this section let us observe that any function which is defined 
by a convergent sequence of functions may also be defined by a convergent 
series of functions. For if /(x) = lim /„(x), let us form the series 

n-»oc 

/,(*) + [/ 2 (x) * /i(x)l + lh(x) - / 2 (x)] + • • • , (20-11) 

that is, define the terms u n (x) by the formulas U\(x) = fi(x), and u n (x) = 
Un(x)-f n - i(x)l if n > 1. Then 

SnM = /,(*) + [/ 2 (x) - /i(x)] + * * • + [/„(*) - fn-Ml 

and when we simplify the right side of this equation we find that s„(x) = /„(*). 
Thus the series (20-11) has the sum f(x). 

For instance, the limit function f(x) in Example 1 can be expressed as the 
infinite series 

/(x) = x + (x 2 -x) + (x 3 -x 2 ) + (x 4 ~x 3 )+ • . • . 


20.1 / THE CONCEPT OF UNIFORM CONVERGENCE 

Let us begin by recalling the definition of the limit of a sequence, as it applies to 
a sequence of functions. Suppose that 

f(x) = lim f n (x) (20.1-1) 

n->x 

for each x of the interval a ^ x ^ b. According to the definition in §1.62 this 
means that if e is any positive number and if x is any point of the interval, there 
is some integer N, the size of which will usually depend on e and x, such that 

|/n(x) / (x)| < € (20.1-2) 
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if N ^ n. The inequality (20.1-2) is equivalent to the two inequalities 

f(x) - e <f n (x) </( x) + e. (20.1-3) 

Let us examine in a particular case the way in which the choice of N may 
depend on e and x. Consider, for instance, f n (x) = x n . We saw in Example 1 of 
§20 that /„(x)-^/(x) if 0 ^ x ^ 1, where /(x) = 0 if 0 ^ x < 1 and /( 1) = 1. Suppose 
that 0 < e < 1 and 0 <x < 1. Then (20.1-2) is equivalent to each of the following 
inequalities: 

x" <e. 


n log x < log e, 


log(l/e) < n log(l/x), 

log(l/e) 
log(l/x) "• 


(20. 1-4) 


(Observe that logx <0 and log(l/x) >0if0<x<l.) Now, to have (20.1-4) true 
for all n ^ N it is necessary to choose the integer N large enough so that 


log(l/c) 

log(l/x) 


< N. 


(20.1-5) 


The dependence on e and x shows up clearly here. As e-^0, log(l /e)-» + a>, and 
hence N -> ». Also, for a fixed e, as x-»l— , log(l/x)-^0+ and we see from 
(20.1-5) that There is no value of N such that (20.1-5) holds simul- 

taneously for all values of x in the range 0 <x < 1. 

The foregoing situation illustrates what we shall call nonuniform con- 
vergence. We shall now define what we mean by uniform convergence. 


Definition. Suppose that the sequence of functions {/„(x)} and the function f(x) 
are defined on the interval [a, b] and satisfy the following condition: To each 
e >0 corresponds some integer N such that , for every x in the interval , 

|/n(x) / (x) | < € 

provided that N ^ n. Then we say that f n (x) converges uniformly to f(x) on the 
interval [a, b]. The essential thing is that N is to be independent of x; it will 
usually depend on e, however . 

Example 1. If / n (x) = x”/n, then lim n ^/ n (x) = 0, the convergence being uni- 
form, on the interval 0 ^ x ^ 1. For, if x is on the stated interval, 


if 1/e <n. Hence we may take N as the smallest integer greater than 1/e. This 
choice is independent of x. 

Example 2. The convergence in Example 3 of §20 is not uniform on 
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o = * = 1. For, since /(x) = 0 for each value of x involved, uniform convergence 
would mean that |/„(x)| <€ when n ^ N, where N depends only on e, not on x. 


In particular, then, we should have 


'■(s) 


<e if n is sufficiently large. But 


^"Gn)| ~ an ^ certainl y i ne Q ua Iity 2n <e is false if n is sufficiently large. 
This is a case where the choice of N depends on x, and as x -> 0, N -> oo. 


Uniform convergence may be portrayed graphically by interpreting the 
inequalities (20.1-3). Let the graph of y = /(x) be displaced upward by the 
addition of e. This gives us the graph of y=/(x) + e. Similarly, a downward 
displacement of amount e gives us the graph of y = /(x) - e. Let us visualize 
these two displaced graphs, on the assumption that the function / is continuous. 
Considering the portion of the xy-plane between these two curves, for a ^ x ^ h, 
we have a ribbon-like region of width 2e in the y-direction (see Fig. 178). The 



Fig. 178. 


inequalities (20.1-3) state that the graph of y = f n (x) lies within the ribbon 
throughout its length. For each e > 0 there is a ribbon; to have uniform 
convergence the graph of y = f n (x) must lie entirely in the ribbon when n is large 
enough (n = N), and this must be true for each e. 

The concept of uniform convergence relates to a whole set of values of the 
variable x; we have stated the definition for the case in which the set of values is 
a closed interval. The definition is essentially the same for any interval, whether 
open, closed, or neither. The crux of the matter is to have N the same for all 
values of x in the given interval or set. 

Example 3. Let /„(x) = J 7 j~ 2 p- Here we have 

lim f n (x) = 0 

n-MK 

for all values of x. The convergence is uniform in any closed interval which does 
not include x = 0, but it is not uniform in any interval having x = 0 in its interior 
or at one end. We shall investigate these statements about uniform convergence. 
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Fig. 179. 


The function f n is odd, i.e., /„(— jc) = -/„(jc), so that the graph of y = f n (x ) is 
symmetric with respect to the origin. Hence we confine our attention to values 
of x for which x ^ 0. The graph is easily constructed with the aid of information 
obtained from the derivative. An easy calculation shows that 


fnW = 


n( 1 - n 2 x 2 ) 
(1 4- m 2 jc 2 ) 


The graph of y = f n (x), for jc> 0, rises to a maximum at jc = 1/n, and then 
diminishes toward zero as x-^> + °°. The maximum value is /„( l/n) = 2 ' The 
graphs of n = 1, 2, 12 are shown in Fig. 179. It is clear from this figure that the 
convergence is nonuniform in any interval containing the point x = 0, for such an 
interval will contain the point jc = 1/n if n is large enough, and so we can never 
have |/„(x)| <€ for all values of x in this interval if e <i no matter how large we 
take n. However, the convergence is uniform for all values of x such that x ^ 8, 
where 8 is any fixed positive number. For, e > 0 being given, if we choose N so 
large that 


N <S 


and 


n8 

1 + n 2 5 i 


< € 


if N ^ n, then it will also be true that 


nx 

1 + n 2 x 2 


<€ 


if N ^ n and 8 ^ x, so that the convergence is uniform for these latter values of 
x. The situation is illustrated in Fig. 180. 

When we are dealing with infinite series, we say that a series of functions, 
each defined on a certain fixed interval, is uniformly convergent on that interval 
provided that the sequence of partial sums s n (x) is uniformly convergent on the 
interval. 

Sometimes it is convenient to express the condition for uniform convergence 
without reference to the limit function /(jc), but entirely in terms of the sequence 
{/„(*)}. 
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THEOREM I. Let the functions of the sequence {f n (x ) } be defined on the interval 
[a, b]. In order that the sequence be uniformly convergent to a limit function 
f(x) on the interval, it is necessary and sufficient that to each e>0 there 
correspond some integer N independent of x such that 

\fn(x)-fm(x)\<€ ( 20 . 1 - 6 ) 

whenever N ^ m and N ^ n. 

The proof is a direct consequence of Cauchy’s convergence criterion 
(Theorem VI, §16.5). We shall prove only the sufficiency of condition (20.1-6), 
and leave the proof of the necessity to the student. It follows from (20.1-6) by 
Cauchy’s criterion that the sequence \f n (x)} is convergent for each x of the 
interval, and so defines a limit function f(x). We may then let m -> o o in (20. 1-6) 
and obtain the result that |/„(x) - f(x) | ^ e if N ^n. Since N is independent of x, 
this shows that the convergence is uniform. 

Later on this chapter we shall see that answers to questions A-D or A'-D' 
may be stated in terms of the concept of uniform convergence. 

EXERCISES 

1. Discuss the functions f(x) = Hiring f n (x) defined by the following sequences for 
the values of x indicated. In each case sketch several typical graphs y = f n (x), showing 
the character of the approximation of f(x) by f n (x ) for rather large values of n. Indicate 
the discontinuities, if any, of /(x), and answer the particular questions in each case. 

(a) /„(x) = (sin x)", 0 ^ x ^ tt. Is the convergence uniform? Explain. 

(b) fn (x) = (sin x) 1/n , 0 ^ x ^ 7 T. For what values of a and b is the convergence uniform on 
a^x^bl 

(c) f n (x) = (1 ln)e~ n2xl , all x. Is the convergence uniform? Prove your answer. 

(d) /„(x) = nxe~ nx , x ^0. Prove that the convergence is uniform for x ^ 5, where 8 is any 
positive number. Why is the convergence not uniform on the interval 0^ x ^ 8? 

(e) f n (x) = tan^ 1 nx, all x. What conditions must a and b satisfy if the convergence is to 
be uniform on a ^ x ^ b? 

(f) /„(x) = ( xln)e ~ xln , jc§? 0. For given positive € and A, find N such that |/„(x) — / (x)| < € 
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if O^x^A and N^n. What is the corresponding verbal statement about uniform 
convergence? Is the convergence uniform for all x such that x ^ 0? 

2. Follow the general directions of Exercise 1 in the case of /„(x) = x7( 1 + x 2 ”), 
x^O. Get what help you can from examination of fh(x). Show that |/„(x)|^x” if x ^0, 
and |/„(x)|^x _ " if x > 0. Use these inequalities to prove that the sequence converges 
uniformly if O^x^aCl, and also if 1 < b ^ x, where a and b are constants. Is the 
convergence uniform in the interval \ ^ x ^ i? 

3. Let f n (x) = x an , a n =!+!+• ■ +1/2”. 

(a) Find /(x) = lim n _*./ B (x). 

(b) Show that |/„(x) - /(x)| ^ 1 - a n if O^x^l. What do you infer about uniform 
convergence? Suggestion: Consider where / n (x)-/(x) is a maximum on the interval 
O^x^ 1. 

4. (a) Find a simple expression for the function /(x) = S^= 0 x 2 /(1 + x 2 )" when x^O. 
What is /( 0)? Does /(x) have any discontinuities? 

(b) If s„(x) is the sum of the first n terms of the series, show that 0 </(x) - s„(x) ^ 
1/(1 + S 2 )" -1 if x 2 ^ 5 2 >0. What do you conclude about uniform convergence? 

(c) Make a sketch showing the appearance of the curve y = s„(x) for a very large value of 
n. Is the series uniformly convergent when |x| ^ 5? 

5. Consider the function 

f( . = x / 2x x \ / 3x 2x \ 

t{X) x + 1 \2x+l x+1/ \3x + l 2x+lj 

(a) Find the sum s„(x) of the first n terms of the series and hence find the value of /(x) 
for each x. (b) Plot the curves y = s n (x) for several values of n, indicating how they 
look for large values of n. Is the convergence uniform when 0^ x ^ 1? when — 1 ^ x ^ 0? 
when 1 ^ x ^ 2? (c) Show that |s„(x) — /(x)| ^ 1 I(n8 - 1) if |x| ^ 5 > 0 and n5 > 1. What can 
you conclude about uniform convergence from this? 

6. Give a proof of the necessity of condition (20.1-6) in Theorem I. 

7. Suppose /o(x) is continuous when O^x^a, and /„(x) = / 0 * dt. Prove that 
f n (x ) converges uniformly to 0 when 0 ^ x ^ a. 

8. Prove that the series 2^=i(-l )" +1 /(n + x 2 ) is uniformly convergent for all values 
of x. Also prove that it is not absolutely convergent for any value of x. 

20.2 / A COMPARISON TEST FOR UNIFORM CONVERGENCE 

We shall now give a theorem which is useful for proving that a series is 
uniformly convergent. While a series may be uniformly convergent without 
meeting the requirements of the theorem, it is nevertheless true that in a large 
number of important cases the theorem provides the simplest practical method 
for establishing the fact of uniform convergence. 

THEOREM II. Let the terms of the series 

Ui(x) + u 2 (x) + • • ■ + u n (x) + ■ • * (20.2-1) 

be defined on an interval. Let 

Mi + M 2 + • • • + M n + • • • 


(20.2-2) 
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be a convergent series of positive constants , and let the inequality |u n (x)| ^ M n 
be satisfied for all values of n and for all x on the interval , Then the series 
(20.2-1) is uniformly convergent on the interval. The word “ interval ” may be 
replaced by “ point set ” and the theorem is still true. 

Example L The geometric series 

l + x + x 2 + x 3 + • • - + x n + • • • (20.2-3) 

is uniformly convergent on the interval -r^x^r if 0 < r < 1. For, let 

u n (x) = x n , M„ = r n . 

Then, on the stated interval, |x| ^ r and so | M n (jc)| ^ M n . Since the series 2 r n is 
convergent, the uniform convergence of (20.2-3) follows by virtue of Theorem 
II. 

We may note that although the series (20.2-3) converges at each point x of 
the open interval — 1 <x < 1, it does not converge uniformly on that interval. For 
the partial sums of (20.2-3) may be written 

l + x+x 2 +- ■ ■ + x"-' = x . 

1 - X 

and the sequence (l-x n )/(l-x) is not uniformly convergent in the interval 
0 ^ x < 1, as may be seen by referring to the discussion in the second paragraph 
of §20.1. 

Proof of theorem II. Let 

S n (x) = U i(x)+ * * * + U„(X). 

Then, if m <n, s n (x)~ s m (x) = u m+] (x)+ • • • + u n (x), and 

\s n (x) >s m (x)| = ^fm+i F * * * H - M„, 

since |u»c(x)| ^ Mk. Now, if e >0, there is some integer N such that 

M m+1 + * • • + M n < e 

if N ^ m <n (see Theorem VIII, §19.3). Since the M’s are constants, the choice 
of N is completely independent of x. But then we have 

M*)- s m (x)\ <e 

if N ^ m <n; by Theorem I this establishes the uniform convergence of the 
series (20.2-1). 

Theorem II is often called the M-test, or the Weierstrass M-test. 

Example 2, . The series 

sin x . sin 2x , sin 3x , , sin nx , 

p ? ? ^ ^ n 2 

is uniformly convergent on the entire x-axis. Here the comparison series may be 
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taken to be 


l+i -L 
l 2 2 2 



since |sin nx\ ^ 1 for all n and all x. The series of constants is known to be 
convergent. 


EXERCISES 

1. Prove what you can about the uniform convergence of each of the following 
series by the method of Theorem II. State specifically the range of values of x about 
which you make an assertion. 


(a) 2 


i n + 1 ‘ 

cosx , cos 3 jc . cos 5x , 

(b) — — — H — i — : — I ; — - — (- 


(d) 2 


1*2 3-4 


5-6 




(c) s 


~i n 2 +x 2 


00 2x 

2. Prove that the series 2 ~ 5 is uniformly convergent on any finite closed 

„ = i n — x 

interval not containing any of the points ±1, ±2, .... 

3. Show that the series l/n x converges uniformly for values of x such that 
x ^ p, if p > 1. 

4. Suppose that a„ ^ 0 and that the series a n n~ x is convergent when x - x 0 . 
Show that it converges for values of x such that x ^ x 0 . 

5. Consider the series 2 yTl — "V 

(a) Show that it converges uniformly when x ^ 0. 

(b) Show that it converges uniformly if |x| ^ a , where 0 < a < 1. 

(c) Show that it converges uniformly if b ^ x, where —l<b. 

(d) Show that it converges uniformly if x ^ - c, where c > 1. 


20.3 / CONTINUITY OF THE LIMIT FUNCTION 

We now return to the questions A and A' of §20. 

THEOREM III. Let the functions {/„(*)} be defined on the interval a^x^b, 
and let them converge uniformly on this interval to a limit function f(x). 
Then , if each of the functions f n is continuous at a point x 0 , the limit function 
f is also continuous at x 0 . In particular , if each f n is continuous on the whole 
interval , so is f. 

Proof. We write 

f(x) - f(x o) = (f(x) - fn(x)) + (fn(x) - f H (x 0 )) T (f n (x 0 ) ~ f(x 0 )), 

|/(x) - /(x 0 )| S? | fix) - f n (x) I + |/„(x) - /„(x„)| + |/„(X 0 ) - /(X„)|. (20.3-1) 
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Now let e>0 be given; x 0 is fixed, and we are to show that |/(x) — /(x 0 )| < e 
provided x is sufficiently close to x 0 , for this is what is meant by saying that / is 
continuous at x 0 . Now, by the uniform convergence, we can choose n, in- 
dependent of x, so large that 

|/„(x)-/(x)|<| 

for every x in the interval. If this is done, (20.3-1) shows that 

|/(x) - /(x 0 )| < |/„(x) - /„(x„)| + le- (20.3-2) 

Now n has been fixed; since /„ is continuous at Xo, we shall have |/ n (x) — /„(x 0 )| < 
e/3 if x is sufficiently close to Xo. But then by (20.3-2) we shall have |/(x) — 
/(x 0 )| < e. This completes the proof. 

As a corollary of Theorem III we have the following assertion about series: 
If the terms of an infinite series are continuous on an interval a ^ x ^ b, and if 
the series is uniformly convergent on the interval , the function defined as the sum 
of the series is continuous on the interval. 

EXERCISES 

1. (a) Find /(x) = lim n ^ocX 1/<2,,-1) for each value of x. From the nature of /(x) what 
do you conclude about uniform convergence of the sequence in intervals containing the 
origin? 

(b) Proceed as in (a) with /(x) = lim„_ oc x 2/<2n-1) . 

2. From an examination of the function /(x) = lim„-oc 1/(1 + x 2n ) and use of Theorem 
III, state what you conclude about lack of uniform convergence of the sequence. Prove 
what you can of an affirmative nature about uniform convergence of the sequence. 

3. If /(x) = 1 + x n / n, find lim x ^ 0 /(x) and justify your answer. 

4. Is the function /(x) = S^ 0 xe _nx2 continuous at x = 0? Explain your answer. 

5. Explain why the function 

is continuous except at the points x = 0, ±1, ±2, What is lim x ^o/(x)? 


20.4 / INTEGRATION OF SEQUENCES AND SERIES 

The questions C and C' of §20 may be answered, in one way at least, by the 
following theorem. 

THEOREM IV. Let the functions f n (x) be continuous on the closed interval 
a ^ x ^ b, and let them converge uniformly on this interval to the limit 
function f(x). Then 


f f(x) dx = lim f f n (x) dx. 

J a J a 


(20.4-1) 
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For infinite series this is equivalent to the statement that if the series 


/(■*) = 2 «-.(*) 


n= 1 


is uniformly convergent on the interval a^x^b, and if the terms u n (x) are 
continuous on the interval, then 



(20.4-2) 


Proof. The proof is quite easy. Consider (20.4-1). 


[ f n (x) dx - [ f(x) dx = [ [/„(jc) — /(jc)] dx, 

J a J a J a 

|| f n (x)dx-J f(xdx |s| |/„(x)-/(x) 


(20.4-3) 


Now suppose e >0 is given. Choose N, independent of x, so large that if N ^ n 
we have 


\f„(x)-f(x)\< F ^ 


(20.4-4) 


if a ^ x ^ b. This is possible because of the assumed uniform convergence. Now 
from (20.4-3) and (20.4-4) we see that 



(x) dx - 



dx 


< € 


(20.4-5) 


if N ^ n, for 

f a Ux)-m\dx<f a¥ ^dx = e. 

But (20.4-5) is just the condition which means that, by definition (20.4-1) is true. 

For the case of the infinite series, (20.4-1) becomes, with f n (x ) as the partial 
sums, 

b rb 

f(x) dx = lim I [mi(jc) + • • + u n (x)] dx 

j n-K* J a 

= limT f ui(x) 

n-><*LJ a 

This last relation, in turn, is just another way of expressing (20.4-2). 


dx + • • ♦ + 


| u n (x)dxf 


Example. We saw in Example 1, §20.2, that the geometric series 
4— = i+x+---+ *" + ••• 

1 — X 

is uniformly convergent if —r^x^r, where 0<r<l. Let us then apply 
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(20.4-2), integrating from -r to r. The result is 


Now 


J = J dx + f xdx + ‘-- + J x n dx + - *. (20.4-6) 


/. 


T dx T 1 -( - r 

= -log(l-x) = log 


r \-x 


1 -r 


i 


.n+l 


(-/•) 


n + l 


x n dx = — 

_ r n + l n + l 


The latter expression is equal to 0 if n + 1 is even, and equal to 2 r n+1 /(n + 1) if 
n + 1 is odd. Therefore (20.4-6) becomes 


i 1 + r ~ 

log T = 2 

1 — r 


( 

V + 7 + ? 


+ * 



This result was obtained by a different method in Example 1, §19.6. 

As was suggested by this example, Theorem IV has important applications in 
deriving certain series expansions from other series expansions by integration. 

The conclusions (20.4-1) and (20.4-2) of Theorem IV may be false if the 
convergence is not uniform. This is illustrated by Example 3, §20, in which the 
convergence is not uniform. To have (20.4-1) and (20.4-2) it is sufficient to have 
uniform convergence; but uniform convergence is not a necessary condition in 
all cases. Suppose, for example, that we modify Example 3_of §20 by taking the 
height of the triangle in the graph of y = f n (x ) to be 2'Vn instead of 2 n. The 
convergence is still nonuniform, and lim n -+oc/ n (x) = 0 when 0 ^ x ^ 1. But now 

f f„(x) dx = += 

J 0 V n 

and so (20.4-1) is true in this case. 


EXERCISES 

i. if/(x>=i 


cos nx 


i n 


show that 


J 'tr/2 

fix) 

0 


dx 


00 /_ i 

2 Justify your reasoning. 

n =0 (Zn + 1 j 


^ Tj£ . r, x sin 3x , sin 5x , sin 7x , 

2. If /(x) = — : — — + — — — + ~ — — +• 


find a series for 


r-jr /2 

fix) 

JO 


dx. 


1*2 3*4 5*6 

3. If /„(x) = nxe~ n * 2 and f(x) = lim n -«/„(x), show that the sequence converges 
nonuniform ly on the interval 0 ^ x ^ 1, and that 


f f(x)dxy*lim[ f n (x)dx. 
Jo n->«Jo 


2nx 


4. If /(x) = lim /n (x), where f n (x ) = ~t~ 4 ’ find 

n-Kx 1 i FT X 
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What do you conclude about uniform convergence of the sequence on the interval 

5. From the series expansions of e\ sin t, and cos t in powers of t (see §19.1), verify 
that fa e l dt = e b — e a and that / 0 X sin t dt — 1 - cos x, Jo cos t dt — sin x. Prove whatever 
you need about uniform convergence for applying Theorem IV. 

20.5 / DIFFERENTIATION OF SEQUENCES AND SERIES 

We now come back to questions B, D, B', and D' of §20. We shall answer D and 
D', and in so doing shall answer B and B\ That is, we shall give conditions which 
will not only assure us that the limit function is differentiable, but will tell us 
how to express the derivative itself as a limit function. Here again uniform 
convergence is the key to the situation. But now we must look for uniform 
convergence of the sequence {/'(x)}. 

THEOREM V. Let the functions / n (x) he defined and have continuous deriva- 
tives on the interval a^x^b. Let the sequence {/„(*)} he uniformly con- 
vergent on the interval , and let the sequence {/„(x)} itself be convergent , with 
limit function f(x). Then f is differentiable , and 

f'(x) — lim /i(x). (20.5-1) 

n-»“ 

For infinite series this is equivalent to the statement that if 

f(x) = 2 u n (x ) 

n = l 

is convergent and if the series of derivatives 

2 “AM 

n = 1 

is uniformly convergent , then 

/'(*) = 2 W n (x). (20.5-2) 

n= 1 

Here it is assumed that each term h„(x) has a continuous derivative . 

Proof. Let us denote the limit of the sequence {/„(*)} hy g(x); we do not yet 

know that g(x) = f(x). Since fh(x) converges uniformly to g(x). Theorem IV 

permits us to write 

f* g(t) dt = lim r m) dU (20.5-3) 

J a n-** J a 

for the convergence is also uniform on any subinterval a ^ t ^ x, where x is any 
point of [a, b]. Now 


\ X f n {t)dt=f n {x)-f n {a). 

J a 
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Since /„(x)->/(x) as n -» », (20.5-3) becomes 

f X g(0 dt = f(x)-f(a), 

J a 

or 

/(*)=f g(t)dt+f(a). (20.5-4) 

Now g is continuous, by Theorem III. We conclude from (20.5-4) and Theorem 
VII, §1.52, that / is differentiable, with /'(*) = £(*)• This result is just another 
way of stating (20.5-1). The transition from (20.5-1) to (20.5-2) is accomplished 
in the same way as was done for (20.4-1) and (20.4-2) at the end of the proof of 
Theorem IV. 

Example . Use Theorem V to aid in obtaining the series expansions, valid if 

|x| < 1, 

( l - x7 = 1 + 2x + 3x 2 + 4x 3 + • • • + (n + l)x" + • • • , (20.5-5) 

— — x)1 = 2 + 3 • 2x +4 • 3x 2 + • ■ - + (n+ 2)(n + l)x"+ • • •. (20.5-6) 

We start from the geometric series 

y— — ■ = l + x+x 2 +--- + x" + -- -, (20.5-7) 

which we know to be convergent if |x|<l. The series (20.5-5) results from 
(20.5-7) according to the rule of (20.5-2), and (20.5-6) is derived from (20.5-5) in 
a similar manner. In order to justify this procedure by Theorem V we must 
prove that each of the series (20.5-5), (20.5-6) is uniformly convergent if 
-r^x^r for any r such that 0<r<l. If x is any fixed point such that 
-1 < x < 1, we can choose r so that |x| < r < 1. Then x will be in an interval of 
uniform convergence, and the foregoing formulas will be justified. 

Suppose then that 

0 < r < 1. 


If -r ^ x ^ r we have 

|(n + l)x"| ^ (n + l)r". 

Now consider the series 

1 + 2r + 3r 2 + ■ • • + (n + l)r" + • • *. (20.5-8) 

This is a series of positive constants. It is convergent, for the ratio of successive 
terms is 

(n + 2)r n+1 _ n + 2 
(n + l)r" n + 1 ’ 
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and this ratio converges to r as n^oo. Since r<l, the series (20.5-8) is 
convergent, by Theorem VII, §19.22. The uniform convergence of the series 
(20.5-5) is now a consequence of the M-test (Theorem II, §20.2); the M- series in 
this case is (20.5-8). 

The proof that the series (20.5-6) converges uniformly if |x| = r, r<l, is 
entirely similar, and the details are left to the student. 


EXERCISES 

1. Which of the following functions fix ) has the property that fix) can be calculated 
for each x on the specified interval by differentiating the series for fix) term by term? 

(a) f(x) = 2 —T •» 0^x^2tt. 

n= 1 n 

(b) /(x)= i (-1)" +, C OSxSl. 

n = l n 


-\%x^ 1 . 


(£) 

(d) /(JC) = S„2frT’ - 1<x<1 - 


(e) f(x) =2 ( — - — + — V 0 < x < it. 

^t\x- tiTT mr) 

x 2 x 4 x 2n 

(f) fix) = 1 - Y + + (~ 1)W 2 *. 4 2 . . (2y|) * + ‘ 

2. Prove that the series (20.5-6) converges uniformly if |x| ^ r, where 0 <r < 1. 

3. Let f n (x) = (2/tt)x tan -1 nx, f(x) = lim n -+~f n (x). Show that fix) = |x|, so that it is 
clear that / is not differentiable at x = 0. Show that \im n ^f' n ix) exists for each x, 
including x = 0. What do you conclude about the uniformity of convergence of the 
sequence of derivatives? 

4. Let fix) = limn^o /„(*), fnix) = (l/n)~" 2 * 2 . Show that limbec /;,(*) = fix) for 
every x , but that the convergence of the sequence of derivatives is not uniform in any 
interval containing the origin. The original sequence is convergent uniformly on the entire 
axis. 
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21 / GENERAL REMARKS 

The purpose of this chapter is to give a systematic exposition of some of the 
most important things about power series. 

The representation of functions by power series is one of the most useful of 
mathematical techniques in a wide variety of situations. Sometimes we start 
from a function that is defined for us in some manner not employing series, and 
seek to expand the function in a power series. At other times we may form a 
power series, or have one presented to us, and then undertake to use this 
function in some way. In either of these situations we need to know something 
of what properties a function has if it is defined by a power series. 

The general form of a power series is 

a 0 + ai(x — x 0 ) + a 2 (x - x 0 ) 2 + * * • + a n {x - x 0 ) n + • • *; (21-1) 

this is called a power series in x - x 0 . Here x 0 is fixed and x is variable. In the 
special case where x 0 = 0 the series takes the form 

ao+ a\X + u 2 * 2 + * • • + a n x n + ■ ■ -. (21-2) 

It turns out that in studying power series it is sufficient to consider (21-2), since 
the general case (21-1) can be reduced to (21-2) by a translation of the origin along 
the x-axis. For this reason all the general theory of power series will be developed 
for series of powers of x, of the form (21-2). 

For convenience we shall often refer to the series (21-2) as the series 
X a n x n , omitting the limits n = 0 and °o for greater ease in printing. 

21.1 / THE INTERVAL OF CONVERGENCE 

The first important fact about a power series is expressed in the following 
theorem: 

THEOREM I. Suppose a power series X a n x n is convergent for x = x 0 , where 
Xot^O. Then it is absolutely convergent when |x|<|x 0 |. The same conclusion 
holds under the weaker assumption that there is some positive constant A 
such that |a n xg| ^ A for all values of n. 

Proof. If Da„xo is convergent, a n Xo-+0, and therefore the terms a„xg are 
bounded. Under the hypothesis in the theorem we have 
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The geometrical series 



is convergent if |jc|<|jc 0 |; therefore the series 2 |a„x n | is convergent, by the 
comparison test (Theorem II, §19.2). This proves the theorem. 


Conceivably a particular power series may be convergent for all values of x. 
This is the case with the series for e x , for instance (see (19.1-6)). It is also 
conceivable that a series may not be convergent for any values of x except the 
one value x = 0. Such is the case with the series 

1 be + 2!x 2 + • • • + n !x n + • • •, 
for, with u n - n !x", if |jc| s* 0, 


— = (n + l)|x| +°° 

I | 

as n-> oo, and so the series is divergent, by Theorem XIII, §19.4. 

Let us now consider the case in which the series 2 a n x n is convergent for at 
least one nonzero value of x, but is also divergent for some value of x. Let us 
denote by R the least upper bound of the positive numbers x for which 2 a n x n 
is convergent. Denote this set of positive numbers x by S. The set S is not 
empty, for by Theorem I, if the series is convergent for the nonzero value x 0 , it 
is convergent for all positive x such that x < |x 0 |. The set S must be bounded, 
because of our assumption that the series is not convergent for all values of x . 
For if S were not bounded, then for any x we could find an x 0 in S such that 
|x| <x 0 , and this would imply that 2 a n x n is convergent. Since S is not empty 
and has an upper bound, it has a least upper bound (Theorem II, §2.7). This 
justifies the introduction of the number R. 

We now assert the following: The series 2 a n x n is absolutely convergent if 
\x\<R and is divergent if |x| > R. The proof is a simple consequence of Theorem 
I. Suppose |x| < R. Then choose x 0 so that x 0 is in the class S and |x| <x 0 = R; 
this is possible, since R is the least upper bound of S. But then the series 2a n x" 
converges absolutely, by Theorem I. On the other hand, suppose that |x|>R. 
Then the series 2 a n x n cannot converge, for if it did, Theorem I would assure us 
that 2 a„y" converges if R < y < |x|, so that y would be in S, contrary to the fact that 
R is the least upper bound of S. 

We sum up the foregoing conclusions in theorem form. 


THEOREM II. For a power series 2 a n x n there are three possibilities: 

1. It is absolutely convergent for all values of x. 

2. It diverges for every x 5* 0. 
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3. There is a positive number R such that the series converges absolutely if 
|x| < JR and diverges if |x| > R. 

In case (3) we call R the radius of convergence of the series. The interval 
-R <x <R is called the interval of convergence. The series may or may not 
converge at the end points x = ±R. It may converge at both, at just one, or at 
neither. These possibilities were illustrated in Chapter 19. 

In case (1) we say that the radius of convergence is infinite, and that the 
interval of convergence is the entire x-axis. It is a convenient symbolism to write 
R = oo in this case. In case (2) we write R = 0; here there is no interval of 
convergence. 

THEOREM III. Suppose that the series 2 a n x n has a positive or infinite radius 
of convergence R. Let 0 <r <R (if R = ° o 5 r may be any positive number). 
Then the series converges uniformly on the closed interval —r^=x^r. 

Proof. The series 2 |a„| r n is convergent, since the power series converges 
absolutely at x = r, by Theorem II. If -r^x^r we have |a n x n | ^ |a„| r n . The 
uniform convergence is therefore a consequence of the Weierstrass M-test 
(Theorem II, §20.2), with M n = |a„| r n . 

In the cases most commonly arising in practice, the radius of convergence of 
a power series may be found by using d’Alembert’s ratio test (Theorem XIII, 
§19.4), as illustrated by Examples 1 and 2 in §19.4. If the limit 

lim — =L (21.1-1) 

d n 

exists, and if we write u n - a n x n , then 

lim — = lim — |x| = JL|x|; 

n-»oc Un n->-oc Q n 'I 

the series converges if L|x| < 1, and diverges if L|x| > 1. From this we conclude 
that R = \IL if 0. Also, R = °° if JL = 0, and R = 0 if L = <». As a formal 
statement we have 

THEOREM IV. The radius of convergence of the series 2 a n x n is given by 

R = lim , (21.1-2) 

provided that the limit exists or is +°o. 

There are power series for which the limit (21.1-2) does not exist; there is 
then a means of determining R by examining the sequence |a„| 1/n . This problem is 
considered in §21.5. 
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Theorem III has important consequences, the chief immediate ones of which 
have to do with the function defined by a power series. 


THEOREM V. Let a function f be defined by 

f(x)='Za n x n , (21.1-3) 

n =0 

where it is assumed that the radius of convergence of the power series is not 
zero. The function f so defined is continuous in the open interval of con- 
vergence of the series. Moreover , if a and b are points of this interval , 

/(x) dx = 2) a n — — j-j — ; (21. 1—4) 

that is, the integral of the function is equal to the series obtained by 
integrating the original power series term by term. 


Proof. The continuity of / is a direct consequence of Theorem III, §20.3, for 
if x 0 is any point of the interval of convergence, we may choose r>0 so that 
|x 0 | <r <R, and the series converges uniformly on the interval -r ^ x ^ r, which 
includes the point x 0 . The assertion (21.1^1) is a direct application of (20.4-2) in 
Theorem IV, §20.4, for the power series converges uniformly on the closed 
interval [a, b ], each point of which is interior to the interval of convergence. 


Theorem V leaves something to be desired in one respect. It does not give us 
any information about what happens at an end of the interval of convergence. 
Suppose, for example, that the series (21.1-3) happens to be convergent at x = R, 
where R is the radius of convergence. Will the function / be continuous at x = R? 
That is, will 


n= 0 


as x^>R from the left? The answer to this question is in the affirmative, but the 
proof is not covered in Theorem V. We shall take up this matter in §21.4. A 
similar question arises with regard to (21.1^1). Is it ever possible to put b = R or 
a = -R? This also is considered in §21.4. 

There are many important practical applications of Theorem V, particularly 
of (21.1^1). 


Example 1. Derive the series expansion 

. i , 1 x 3 , 1 • 3 x 5 
s,n * = * + 2 T +274 T 


1 • 3 • 5 x 7 
2-4-67 


(21.1-5) 


We start from the fact that 


sin 1 x = 


J. 


dt 

Vl-t 2 


( 21 . 1 - 6 ) 
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Formula (21.1-6) is valid if |x| ^ 1; the integral is improper if x = ±1, since the 
integrand becomes infinite at t = ±1. In the binomial series (19.5-5) we replace x 
by - 1 2 and set m = The result is 


(1 


_ t 2 y' n = i - 1 2 f+ fcMJ)H) ( _ t 2 ) 3 + , . . 


= i+5t 2 +T4t 4 +i J 4 J 4t 6 + 


2-4 


2-4-6 


this result is valid if |t|<l, and the series has radius of convergence jR = 1. 
Hence we may integrate the series from t = 0 to t = x if |jc| < 1. Using (21.1-6), 
we obtain the result (21.1-5). This argument, based on Theorem V, shows that 
(21.1-5) is valid if |x| < 1. As a matter of fact, one can easily verify that the 
series converges even when x = ±1, by using Raabe’s test (Theorem XIV, §19.4); 
it is then a consequence of the discussion in §21.4 that (21.1-5) is valid if 
-l^jcil. 

Example 2. Find an expansion in powers of x of the function 

r ' i — p~ tx 

W-l-T-*- 

and use the result to calculate /(!) approximately. 

From the series (19.1-6) for the exponential function we have 


- tx 1 t 2 x 2 t 3 x 3 , 

f = 1 - tX + — XT" + • * 


2! 3! 


Therefore 


l-e 


tx 2 . t 2 x 3 


X 2! + 3! ’ ' * + ( 1) 


^n~l v n 
n-1 1 X 


n ! 


This series representation is valid for all values of x and t; the radius of 
convergence of the power series in t is jR = oo. Integrating, we have 

In particular, 

f(\) = /„' dt = Y - rv. S ) 2 + r^! (I)’ “ • ' ' = 1 • 13 Approximately). 


EXERCISES 

1. Find the radius of convergence of each of the following series. 


/ \ y (2«)1 i 
(a) ^ 


(b) 2 


(njr 

(3n)! 


/ x y v n 
(C) ^ (n!) 1 ' 

(d) ^ (4n)! ' 
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(e) 2 p and q positive integers. 

(f) 2 f 1 ’ 3 ‘ ‘ • (2n - l)f 


2 2n (2 n)\ x 


2. Find the radius of convergence of the series 2 q * * 

integer and q >0. Consider all possibilities. 

3. Find the radius of convergence of the following series: 


where p is a positive 


[q(q - 2) • • • (q - 2n + 2)][(q + l)(q + 3) • • • (q + 2n 
(2 n)l 


1 )] 


where q is not a positive even integer. 

4. Let the radii of convergence of 2 a n x n and 2 b n x n be R r and R 2 , respectively. 
Suppose |on|^ \ b n \ for all sufficiently large values of n. Prove that Ri^R 2 . 

5. Let {a„} be a bounded sequence, and let R be the radius of convergence of the 
series 2 a n x n . Prove that R ^ 1. 

6. If the radius of convergence of 2 a n x n is R, prove that the radius of convergence of 
2 a n x 2n is R m . 

7. Find an expansion as a power series in x for each of the following functions. 
Indicate the radius of convergence in each case. 

(a) [ X ^dt. (c) r e~ ,2 dt. 

JO t Jo 

J * f X t P 

cos u 2 du. (d) 2 dt, p a positive integer. 

o Jo V 1 — t 

8. Deduce the expansion 


oc 

sinh 1 x = x + 2 ( _ 1)" 

n — 1 


1 • 3 • • • (2n - 1) x 2n+l 
2-4- -2 n 2 n + l' 


State the justification of your procedure, and indicate for what values of x you are 
proving the validity of the expansion. 

9. If f(x)= [ u ^ du, find a series for f(x), and calculate the approximate 

Jo u 

value of f(w). 

10. Find the approximate value of f - — dx. 

Jo x 


11. Find a series for the function f(x) = [ * '• dt , and calculate the ap- 

Jo 1+ t 

proximate value of f(io). See Example 2, §19.6. 

fi/2 g~* 2 

12. Find the approximate value of , dx. 

K vi-x 1 

13. Use series (21.1-6) to write numerical series for tt{2 and tt/6, respectively. 


21.2 / DIFFERENTIATION OF POWER SERIES 

The principal fact to be established in this section is the following: If a function 
is defined by a power series which has a positive or infinite radius of con- 
vergence, then the function has derivatives of all orders at each point of the 
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open interval of convergence, and these derivatives are represented by the series 
which are obtained by differentiation of the original series term by term. Thus, if 


/(*) = a 0 + a { x + a 2 x 2 + ct 3 x 3 + a 4 * 4 + • • • (21.2-1) 

is convergent when \x\ <R, then 

f'(x ) = a x + 2a 2 x + 3a 3 x 2 + 4a4jc 3 + • • • (21.2-2) 

/"( x) = 2a 2 + 3 • 2a 3 x + 4 • 3a4* 2 + * * • (21.2-3) 

/"'(*) = 3!a 3 + 4 • 3 • 2d4X + • • ■ , (21.2-4) 


and so on, all these series likewise being convergent if [jc| < jR. To get at these 
facts we begin with a consideration of the series (21.2-1) and (21.2-2). 


THEOREM VI. In the case of any power series , the series in (21.2-1) and the series 
in (21.2-2) have the same radius of convergence. 


Proof. Let R and R' denote the radii of convergence of the series (21.2-1) 
and (21.2-2), respectively. Suppose |x|<R, and choose x 0 so that |jc| < |jco| <R. 
Then the series (21.2-1) is convergent with x = Xo, and consequently a„JCo-^0. 
We may therefore choose a number A>0 such that |a n jcS|^A for all n. Then 

n n(x\ n ~ l 

na n x - — a n x 0 — , 

*0 \Xq/ 

|na„x" -1 | = j“~j nr n ~\ (21.2-5) 

where 


The series 



2 


_A_ 

|x 0 | 


nr 


n~ 1 


is convergent, for the limit of the ratio of the term of index n + 1 to the term of 
index n is 


n + 1 1 

lim r = r < 1. 

n->°c n 

Consequently, by (21.2-5), the series 

2 na n x n ~ 1 

is convergent. This is precisely the series (21.2-2), and so we have proved that 
this series converges if |x|<R. It follows that the radius of convergence of 
(21.2-2) is not less than that of (21.2-1), i.e. R'^R. If R = °° this means that 
R’ = o ° also. 
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To complete the proof we show that R' > R is impossible, so that R' = R. 
For suppose that R' > R and choose x so that R <|x|< R f . Then, for this x , the 
series (21.2-2) is absolutely convergent and the series (21.2-1) is divergent. Now 

|a„x n | = |na„x n_l | |^j < |na„x" _1 | 

as soon as n>|x|. This comparison shows that the series (21.2-1) must be 
convergent, which is a contradiction. The proof of Theorem VI is now complete. 

THEOREM VII. Let f(x) be defined by the power series (21.2-1), and assume 
that the radius of convergence is not zero. Then f is differentiable at each 
point inside the interval of convergence, and f'(x ) is represented by the series 
(21.2-2). Application of this conclusion to f' in place of f shows that f”(x) is 
represented by (21.2-3), and so on. 

The proof is an immediate consequence of Theorem V, §20.5, and Theorem 
III, §21.1, in view of Theorem VI of the present section. 

It is possible that the series for /'(x) will diverge at an end of the interval of 
convergence, even though the series for /(x) may be convergent at that point. 
An example is furnished by the series for log(l + x) at x = 1 [see (19.1-1)]; this is 
convergent, but the series for the derivative, (1 + x) _1 , is divergent at x = 1. 

Theorem VII shows that there is a great difference between functions 
defined by power series and functions defined by series of less special type. 
Another very important class of series is the class of trigonometric series, of 
which 

sin x , sin 2x . . sin nx 

-r+^r + '" + ~^ + ‘" 

is an example. We saw in Example 2, §20.1, that this series is uniformly 
convergent for all values of x. If we differentiate the series term by term we get 

cos x cos 2x ■ cos 3x cos nx 

This series is convergent for some values of x, but not for all values; for 
instance, it is divergent when x = 0. Another termwise differentiation gives the 
series 

-sin x - sin 2x - ■ ■ ■ — sin nx - • • •, 

which is convergent if x = 0, but is divergent except when x is an integral multiple of 

77 . 

Theorem VII permits us to show the relation between the general theory of 
power series and the expansion of a function in a power series by means of 
Taylor’s series. In the Taylor’s series expansion of a function in powers of x the 
f (n V 01 

coefficient of x" is - — p (see (19.1-4)). 
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THEOREM VIII. If a function f(x) is defined by a power series (21.2-1) with 
positive or infinite radius of convergence, the coefficients are related to the 
function by the formulas 




/ (n) (Q) 

n ! 


( 21 . 2 - 6 ) 


This means that the power series is the Taylor's series of the function. 


The formulas (21.2-6) are established by setting x — 0 in the successive 
series for f(x), f'(x), /"(*), etc. (see (21.2-l)-(21.2-4)). It is clear by induction 
that the leading term in the series for f (n) (x) is nla m and (21.2-6) is a direct 
consequence. 


THEOREM IX. Suppose that two power series are convergent and have the same 
sum for all values of x in some interval |*| < r: 

oc oc 

2 a n x" = 2 b n x n , -r<x< r. 

n -0 n =0 

Then a n = b n for all n. 


This theorem is called the uniqueness theorem for power series. It is a 
corollary of Theorem VIII. For, let f(x) be the common sum of the two series. 

f (n) (0) 

Then by (21.2-6) we see that a n and b n are both equal to > and therefore 
equal to each other. 

Example 1 . Consider the functions Jo(x), Jfix) defined as follows: 


.2 n 


Jo(x)=1 ~OW + (2!?? + ( ~ 1} " 


2^r 


(»!T2 

2 n 


J ‘ M - I [' ~ Tifi? * ’ • ' * ( - ir . !(.W + ' "] 


(21.2-7) 

(21.2-8) 


Show that they are defined for all values of x, and that 

H(x) = -Ji(x). (21.2-9) 


The function J 0 (x) is called the Bessel function of order zero of first kind ; 
J\(x) is called the Bessel function of order one of first kind. These and other 
varieties of Bessel functions are of great importance because of the way they 
arise in many kinds of physical problems. 

Both series are convergent for all values of x, as is readily verified by the 
ordinary ratio test (Theorem XIII, §19.4). To verify (21.2-9) we write the series 
for J 0 (x ) and J\(x) in the forms 


J«(x) = 2 (-i )" 


.2 n 


n- 0 


n In !2 


, j,(x) = 2 (-D" 


.2n + l 


n=0 


n !(n + 1) !2 


2n + l 
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Then, calculating J b(x) in the manner justified by Theorem VII, we have 


/«(*) = 2 eir 

n = 1 


2nx 2n 1 
n !n !2 2 " 


0 C 


= 2 (-D n 

n ~ I 


x 2 "-' 

(n - l)!n!2 2 " -1 


In the series for J 0 (x) the term with n = 0 is a constant, so that the series for 
J o(x) will begin with the term for which n = 1. If we now write n + 1 instead of n 
in the last summation, the new index n will go from 0 to oo ? and we shall have 


J6(x) = 2 (-l) n+1 

n=0 


x 2„ + l 

n !(n + l)!2 2n+l 


Since (— 1)" +1 = —(—1)", we that (21.2-9) is true, by a comparison of the sum- 
mation expressions for Jo(x) and J\(x). 

Example 2 . Determine what can be said about solutions of the differential 
equation 

( l-x 2) 0 + 6 y = ° (21.2-10) 

of the form y = f(x) = a 0 + ajx + a 2 x 2 + • • • + a n x n + • * •, 

i.e., solutions which can be expanded in powers of x, convergent for some 
interval about x = 0, 

Assuming tentatively that there is such a series solution, we differentiate it, 
obtaining 

^ = fli + 2a 2 x + 3a 3 x 2 + ■ • • + na„x"~ l + • * •, 

= 2a 2 + 3 • 2a 3 x 4- • • • + n(n — 1) a„x rt_2 + * ■ •. 
x 2 ^^ = 2a 2 x 2 + 3 • 2a 3 x 3 + • • • + n(n - 1) a n x n + • * •. 

Then 

oc oc oc 

(1 -x 2 )y" + 6y = 2 n (" “ 0 a n x n 2 - 2 n ( n ~~ 1) a nX n + 2 6a„x n 

n -2 n-2 n=0 

= 2 [(n + 2)(n + 1) a„ +2 - n(rt - 1) a n +6a„] x n 

n-0 

(observe that the term n(n - 1) a n is zero for n = 0, 1). If now (21.2-10) is to be 
satisfied, we infer by Theorem IX that we must have 


(n + 2)(n + 1) a n+2 — n(n - 1) a n + 6a„ - 0, n = 0, 1, 2, . . 
Since n(n - 1) — 6 = (n — 3)(n + 2), this last relation is equivalent to 


(n - 3)(n +2) _ n - 3 

a " +2 ~ (n + 2)(n + 1) a " n + \ a ‘ 
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From this relation we may determine a 2, a 4 , a 6 , . . . successively in terms of an 
arbitrary a 0; likewise 03, a 5, 07 , . . . are determined in terms of an arbitrary aj. We 
have 

_ 1 - 3 _ 3-3 

° 3 “ 1 + J ~ “ a b 

whence 07 = a 9 = an = • • ■ = 0. For the even subscripts, 

di— — iflo 
U4 = ~ 3^2 

a 6 = + 5^4 


fl2n+2 = 2n + 1 ain ' 

Clearly none of these coefficients is zero if ao^ 0. Now 

_ (2n - 3)(2n - 5) • — 1 • (- 1) • (-3) 
a2n+2fl2 " • • ' a4 ° 2 (2 n + l)(2n - 1) • • • 5 • 3 • 1 fl2n ' ' a2a °’ 

when like factors are cancelled from either side of this relation we find 


3 

a2n+ 2 ( 2 n + 1 )( 2 n - 1 ) a °‘ 


( 21 . 2 - 11 ) 


What we have done thus far shows that if there is a solution of (21.2-10) in 
the form of a series of powers of x, the solution can be written 


y = a t (x - x 3 ) + a 0 ^ 1 + j - 


1) 


3 * 1 


(In - 1)(2 n - 3) 


x 2n + 




Moreover, the work shows that this really is a solution within the interval of 
convergence of the series, provided there is an interval of convergence. Now the 
infinite series 


oc 2 

„?o(2n- l)(2w-3) 


.2 n 


(21.2-12) 


is convergent when |x| < 1, as may readily be verified. Thus we have found two 
linearly independent solutions of the differential equation (21.2-10): the poly- 
nomial x-x 3 and the infinite series (21.2-12). The coefficients ao and a\ are 
arbitrary. 


EXERCISES 

1. The Bessel function of order m of first kind (m a nonnegative integer) is defined 

°° 1 / \2n+m 

d m (x) = 2 0 (- ( n + m )\ni (2) 

Show that (a) J 0 (x) = x~ x ~(xJi(x)) 9 (b) Ji(x) = x 2 ~(x 2 J 2 (x)), (c) J 2 (x) = 
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-jt~(x 1 Ji(x)). (d) State and prove the rule corresponding to (a) and (b) for Bessel 

functions of higher orders m - 1 and m. (e) Of what general rule are (c) and (21.2-9) 
special cases? Prove the correctness of your answer. 

2. Using only what you know about power series, and nothing of what you know 
about e x , find the function fix), defined by a power series in x, such that /'(*) = /(*) and 
/(0) = 1. What is the radius of convergence of the series? 

3. Find the power series in x, denoted by /(x), such that /"(*) + /(*) = 0 and 
/( 0) = 0, /'(0) = 1. What is the radius of convergence of the series? 

4. (a) Find /(x), a power series in x, such that (l + x)/'(x) = m/(x), where m is a 
constant and not one of the integers 0, 1,2,.. (b) Determine the radius of convergence 

fix) 

of the series, (c) Let g(x) = and show from the requirement in (a) that g'(x) = 0, 

so that g(x) is constant. What can you now conclude about fix) and (1 + x) m for values of 
x inside the interval of convergence of the series? 

5. Suppose /(x) is defined by a power series in x with positive radius of con- 

vergence. Let k be the smallest positive integer such that / (k) (0) # 0. (We suppose /(x) not 
a constant, so that there really is such an integer k.) (a) Show that / has neither a 

relative maximum nor a relative minimum at x = 0 if k is odd. (b) Assuming that k is 
even, show that / has a relative minimum at x = 0 if / (k) (0) > 0 and a relative maximum at 
x = 0 if / (k) (0)<0. 

6. (a) If the function /(x) represented by a power series in x is even , i.e., if 
fix) = /(-x), show that all the coefficients of odd powers of x in the series are zero, (b) 
If the function is odd , i.e., if f(~x) = -fix), show that all the coefficients of even powers 
of x are zero. 


7. Suppose fix) and g(x) are power series in x, each with a positive radius of 
convergence. Suppose that /(0) = /'(0) = • • * = / (m_1) (0) = 0, / <m) (0) ^ 0 and that g(0) = 
g'(0) = • • ■ = g ( "“ o (0) = 0, g (n> (0)# 0. (a) Show that neither fix) nor g(x) can vanish if 


0 < |x| < h, provided h is sufficiently small. 


(b) Show that finj = 


fix) __ Hi 0) 

g <n) (0) 


if m — n, and 


that the limit is 0 if m > n. Also show that finj 


fix ) 
Six) 


= oo if m < n. 


8. Find a power series in x for ~ 


and deduce that 


1= V — 5 

in + 1)! 

9. Show that 1 = _ 2 r ^2! + 2 r 3] by considerin S Jp («" x2 )* 

OO I A J 

10. Show that 4=2 (~2) n+1 — by considering — (x 2 e^). 

n ! ax 

* x" 

11. Find an expression for the function fix) = 2 I 

« =o tn -r i ) n : 

Suggestion: Calculate successively x/(x), ^ (x/(x )), x^(x/(x)) in series form, 

and identify the last series. Then obtain fix) by integration. 

12. (a) Obtain a power-series solution y =/(x) of the differential equation xy" + 
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(1 -x)y' + qy = 0, where q is an arbitrary constant. Find the radius of convergence of the 
series, (b) If q is a nonnegative integer, show that /(x) is a polynomial of degree q. In 
this latter case, if /(0) = q!, /(x) is denoted by L q (x) and called the Laguerre polynomial 
of degree q. 

13. (a) Show that the differential equation y"- 2xy' + 2my = 0, where m is a constant, 
has two independent power-series solutions, /i(x) = a 0 + a 2 x 2 + a 4 X 4 + • • *, / 2 (j c) = 
ciiX + a3* 3 + * * *. Find the radii of convergence of these series, (b) If m is a nonnegative 
integer, show that one of these solutions is a polynomial of degree m. If the coefficient of 
x m in the polynomial is 2 m , it is called the Hermite polynomial of degree m, and denoted 
by H m (x). 


21.3 / DIVISION OF POWER SERIES 

It is quite often advantageous to obtain the power-series expansion of a function 
by representing it as the quotient of two power series and then dividing one of 
these series by the other. We begin with an illustrative example. 


Example 1. Find several terms in the expansion of tan x in powers of x. We 
make use of the known Taylor’s series for sin x and cos x: 


tan x = 


sin x 

COS X 


x 3 x 5 
x 

3! 5! 

1 + 

2! 4! 


Then we perform the long division as indicated: 

x+jx 3 + bM 

1 -5X 2 +5jx“ |x - W + rax 5 - 

X~2* 3 + mX 5 ~ 
ix 3 - bx 5 + 
lx 3 - lx 5 + 

Av’_ 


In this way we are led to the expansion 

tan x = x + 3 * 3 + b* 5 + ’ * *• (21.3-1) 

Although we have not indicated the general term of this series, it is clear that we 
may compute as many terms as we please according to this systematic procedure 
indicated in the long-division process. 

None of our previous theorems furnishes any proof of the correctness of the 
result (21.3-1). Note also that since we have not obtained a general formula for 
the coefficients in (21.3-1), we are at present unable to determine the radius of 
convergence of the series. 

The thing of the foremost practical importance is that the method of 
Example 1 really works. By long division we can find the quotient of two power 
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series as another power series. The essential condition which must be fulfilled is 
that the power series in the denominator must begin with a nonzero constant 
term. Either of the two infinite series in the quotient may in particular cases 
terminate (i.e., instead of an infinite series we may have a polynomial). 
Experience shows that the long-division process is often the best method for 
obtaining a power-series representation. With a rational function, for instance, 
the long-division method is much more practical than the method of computing 
the coefficients in the Taylor’s series by differentiation. 


THEOREM X. Consider a function defined as the quotient of two power series: 


/(*) = 


ao+ a\X + a 2 x 2 + 
b o+ hi* + b 2 x 2 + 


+ a n x n + • • * 
+ h„x n + - • •’ 


(21.3-2) 


where b 0 7 * 0 , and where both of the series are convergent in some interval 
|jc| < r. Then for sufficiently small values of x the function f can be represen- 
ted as a power series 

f(x) = c 0 + Cl* + c 2 x 2 + • • * + c n x n + • • • (21.3-3) 

whose coefficients may be found by the process of long division , or, what is 
equivalent, by solving the relations 

boCo — ao 
boCi + biCo = fli 


bQC n -I- bjC n — i “I" ■ * * "t - b n c o a n 


(21.3—4) 


successively for c o, c i, c 2 , . . .. 


It is impractical to give a complete proof of this theorem with the knowledge 
presently at our disposal, but we shall go as far as is conveniently possible. In 
the first place, the assumption b 0 ^ 0 means that the function defined by the 
series 2 b n x n is not zero at x = 0. It is therefore different from zero throughout 
some interval about the origin, since the function is continuous, by Theorem V, 
§21.1. If we now assume that a power series expansion of the form (21.3-3) is 
valid, we shall have the product relation 

(2 bn*") (2 c„x n ) = 2 a»x" (21.3-5) 

holding throughout some interval in which all three series are absolutely con- 
vergent. The rule for multiplication of absolutely convergent series (see §19.6) 
gives the result 

(2 b„x")(2 c„x n ) = 2 (bo C„ + b|C„-i+ • • • + b„c 0 ) x". 

\n =0 / \n=0 / n=0 


(21.3-6) 
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By the uniqueness theorem for power series we then conclude from (21.3-5) and 
(21.3-6) that the general relation (21.3-4) holds. The coefficients c n determined 
successively in this way are the same as would be found by the long-division 
process, as the student may verify for himself. Note that to solve for c n when 
c 0 , . . ., c n -i are known, it is essential to know that h 0 0. 

The proof is now complete except for justifying the assumption that /(x) can 
be expanded in a series (21.3-3). It is this part of the proof which we shall not 
give. It is an easy consequence of certain standard theorems in the theory of 
functions of a complex variable, but the development of this latter theory is 
beyond the scope of this text. 

Example 2 . The long-division method yields the series expansion 

= 2 — x — x 2 + 2x 3 — x 4 - x s + 2x 6 — x 7 — X s + • • •. (21.3-7) 

1 + X + X 


Detailed verification is left to the student. It may be shown by Theorem XVI, 
§19.4, that the series (21.3-7) has radius of convergence JR = 1. 


It is important to observe that the division method of obtaining the expan- 
sions (21.3-1) and (21.3-7) is much easier than using the formula (21.2-6) for 
computing the coefficients in the expansion. 

Let us consider the situation when bo = 0 in the quotient (21.3-2). The 
general case may be represented by assuming that b k is the first of the h’s which 
is not 0, and that a.\ is the first of the a’s which is not 0. Then, assuming x^O, 


/(*) = 


aix 1 + flt+ix /+1 + • • • 
b k x k + b k+i x k+i + *• * 


This shows that, as x-*0, f(x) behaves like x l I* 1 particular, f(x ) ap- 
proaches a finite limit if l ^ k, but not otherwise. Let us agree to define 
/(0) = lim. t _>o/(x) if the limit exists. The quotient in the parenthesis in (21.3-8) can 
be expressed as a single power series by the method of long division, since b k ^ 0. 


Example 3. Find a power series in x for 


For x^O we have 


sin x 
sin 2x 


x 3 x 5 
X 3! + 5! 


(2x) 3 (2x) 

3! 5! 



We remove the common factor x from the two series, and deal by long division 
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with what is left. In this way we find, for xt* 0, 

x 2 x 4 

sin x _ * 6 + 120 _ 1 1 2 , 5 

sin 2x _ - 4 j 4 4 2 4 X 48 

2-^x z + t?x 4 


x 4 + 


EXERCISES 

1. Find several terms in the power series expansions of the following quotients: 


(a) 

(b) 

(c) 


2e 


l + e 3 
1 


COS X 


(d) 

(e) 


1 - cos x 
sin(x 2 ) 
- x 


cos x — sin x 


© 


log(l-x) 
x cos x - sin 


x sin x 


— ( =ctn *4) 


2. Show that ^ TZT~ = 2 (flo + ay + * • • + fl n )x M , 

n =0 1 X n =0 

and use the result to find the function represented by the following series: 


w £( 1+ Tt 

(b) e (0+1-5+j — + (-ir +, ^)x". 


(c) 2) (0 + 1 + ■ • • + n)x n . 

n —0 

3. Solve the first ( n + 1) equations in Theorem X by determinants, and express c n as 
b o (M+,) times a determinant. For the special case in which ao = 1, ai = a 2 = ■ ■ ■ = 0, show 
that 


by 

bo o 

0 . , 

.. 0 

b 2 

by bo 

0 .. 

. . 0 

bn 

b n - 1 • 


.. b 


4. (a) Suppose that (2~ =0 b^c")(2"_ 0 c n x n ) = 2n=o a n x n . Show that, as a consequence, 
(2n=o(-l) rt h n x n ) (2n= 0 (-l)"c rl x n ) = X^=o(-l) n flnX n . Take all questions of convergence for 
granted. 

(b) Prove, either directly or as a consequence of (a), that if b 0 ^ 0 and 


2 a 2n x 2 " 

n =0 

2] b 2nX 2n 
n =0 


= 2 C 2r.X 2 ", 
n =0 


E (-i yw 2 " « 

^ = E (-l)"c 2n x 2 ". 

E (-l)"b 2 nX 2 ’' " = ° 


then 
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(c) Use (b) to prove that if tanh x = 2T=o A„x 2n+1 , then 

tan x = 2 (-l)”A„x 2 ” +1 . 

rt =0 

(d) Use (b) to prove that if x ctnh x = A„x 2 ", then 


x ctn x = ^ (-l)”A n x 2n . 


5. Define B n as n ! times the coefficient of x n in the power-series expansion of 
x!(e x - 1): 

Bn n 


± V n 

-1 n! 


The numbers B n are called Bernoulli’s numbers. 

(a) Show that B 0 = 1, B i = —s, B 2 -k B 3 = 0. 

(b) Writing 


B n 




x x 

show that the function on the left is equal to the even function — ctnh—. Deduce from 
this that B 3 = B 5 = B 7 = • • • = 0 (see Exercise 6, §21.2). Then show that xctnhx = 
X / -^- v t(2x) 2w . The radius of convergence of this series is not readily determined by our 

present knowledge, but it may be shown to be it. 

6. (a) Use Exercises 4(d) and 5(b) to show that 

XCtnX = So ( ~ ir fe (2X)2 "- 

(b) Use the identity tan x = ctn x - 2 ctn 2x to obtain the expansion 

tan x = 2 (fnTi 2 2 "(2 2 ” - l)x 2 " '. 

X 1 

(c) Use the identity ctn x -I- tan — = — — to show that 

2 sin x 


x 

sin x 


= 2 (-iy 


B: 


(2n)! 


(2 2 " - 2)x- 


21.4/ ABEL’S THEOREM 

Suppose that a function is defined by a power series: 

/w4w'. (21.4-1) 

n=0 

and suppose that the radius of convergence R of the series is positive and finite. 
The series may or may not converge at x = ±R. Let us suppose that the series 
does converge at x = R. In §21.1 we raised the question as to whether, in this 
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circumstance, /(x)->/(R) as x-^R (with x<R). We shall now answer this 
question, by means of a theorem due to Abel. 

THEOREM XI. If the series (21.4-1) converges at x = R, then it converges 
uniformly on the closed interval 0 ^ x ^ R. As a consequence, the function f 
defined by the series is continuous on 0 ^x^R. A like conclusion holds for 
-R ^ x ^ 0 if the series converges at x = -R. 

The proof is made with the aid of an inequality which we state as a lemma of 
independent interest. 

LEMMA. Suppose that 

m ^ Sk = M, k = 0, 1, . . p, (21.4-2) 

where 

Sk ~ U 0 + M,+ * * • + M k , 

and that 

Vo ^ Vi ^ ^ v p ^ 0. (21.4-3) 

Then 

mv u 0 vo + U\Vi + • ■ • + UpV p ^ Mt> 0 . (21.4-4) 


Proof. According to (19.7-1) we have 

u 0 tt 0 + * * * + n p Vp = So(v 0 - Vi) + • • * + Sp-i(u p _i - v p ) + s p v p . 

Now, because of (21.4-2) and (21.4-3) we can write 

M 0 tt 0 + • • • + u p v p ^ Af(u 0 — tt0+ • • • + M(tt p _i - r p )+ Mv p = Mtt 0 . 

The other half of inequality (21.4-4) is deduced in a similar manner. 

Proof of theorem XI. Suppose e is any positive number. According to 
Theorem I, §20.1, our proof will be accomplished if we show that there is some 
integer N such that 

\a m x m + a m+ i x m+ ' + ■ ■ - + a m+p x m+p | < € (21.4-5) 

if N ^ m, 0 < p, and 0 ^ x ^ R. Let us set 

/ V m+k 

u k = a m+k R m+k , Vk={^) . 

Then (21.4-5) is equivalent to 

-€ < M 0 tt 0 + Mitt! + ♦ * * + UpVp < €. (21.4-6) 

The conditions (21.4-3) are fulfilled by the u’s. In addition, tto=l. Since the 
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series is convergent when x = R, we can choose N so that 

- € < a m R m + a m+ iJR m+l + ■ • • + a m+k R m+k < e 

if N ^ m and 0 ^ k (this is just the Cauchy condition for convergence). This 
means, in our present notation, that (21.4-2) is satisfied with m - - e, M = e. 
Applying the lemma, and noting that — e ^ — ei>o, eu 0 = e, we see that the con- 
clusion (21.4-4) of the lemma yields (21.4-6). This completes the proof as 
regards 0 ^ x ^ R. The case of convergence at x = —R is reduced to the first case 
by considering g(x) = f(—x) at x = R. 

Example 1. If the binomial series (19.5-5) converges at x = 1, its sum is 2 m . 
This assertion may be justified as follows: We proved the validity of the 
binomial series expansion for (1 + x) m when |x| < 1. Therefore, by Theorem XI, 
if the series converges at x = 1, its sum there is 

lim (1 + x) m = 2 m . 

x-*\ 

This always happens if m >0, by what was established in Example 4, §19.4. It 
may be shown that the series converges at x— 1 if -l<m, but diverges if 
m^-i (see Exercise 4, §19.5). 


Next we show how Theorem XI permits us to extend the result of Theorem 
V, §21.1, with respect to the integration of a power series. If the power series 

= (21.4-7) 

n=0 


is convergent when |x| <R, then 




n + 1 


(21.4-8) 


provided the series on the right in (21.4-8) is convergent , irrespective of whether 
or not the series in (21.4-7) is convergent at x = R. Of course, if (21.4-7) is not 
convergent at x = R, the integral in (21.4-8) may be improper at the upper limit. 
The proof of the foregoing assertion is simple. If 0 < b < R, we have 




n + l 


by (21.1-4). Then, provided the series in (21.4-8) is convergent, we have 


lim f(x) dx = 

b->R Jo 


V 

n + l 


R" +l , 


by Theorem XI. Since it is also true that 


lim 

b-*R 




(21.4-8) is proved. 
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The remarks at the end of Example 1, §21.1, illustrate the application of 
(21.4-8). 

Example 2. Show that 

S‘' M r Jid ‘ - -(bbb" - + i + " •)• «'■«’ 

We start from the series (see (19.1-2)) 


log(l ~t) = -t - • ■ • 



n 


Dividing by t, we have 


n 


Observe that the series diverges at t = 1. But if we integrate from 0 to x we have 


Jo t 



1 3 1 

— — 5X — * * * 3* 

3 n 


n 


This series converges when x = 1, and so we have (21.4—9) as a special case of 
(21.4-8). The integral is improper at t = 1, but not at t = 0, since the integrand 
approaches the finite limit - 1 as t-> 0 (this may be seen by use of L’HospitaFs 
rule). 


EXERCISES 

1. (a) Express /S tan - ' t dt as a power series in x , and discuss the range of validity 
of the series, with particular attention to the end points of the interval of convergence, 
(b) Use the result in (a) to show that. 


— log V2 = 1 - 3 — g + i + 5 t* + * • •• 

2. Let a n = ^ . Show that Y a - = — — - by considering 

\ l • 4 • • ■ In } "i in — I 77 

/ 0 W2 (1 - x 2 sin 2 t) 1/2 dt as a power series in x. 

3. (a) Justify the formula 

1 L__ + ! 

iol + t q p p + <? p + 2q 


where p and q are positive integers, (b) Calculate the value of the integral to two 
decimal places, if p = 10, q = 40. 


4. Let f(x) = 2 1 ‘ Eor what positive integral values of p is it true that 


/(x)dx= !,S$ p 


np cy 
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21.5 / INFERIOR AND SUPERIOR LIMITS 

The subject matter of this section is not properly a part of the theory of power 
series, but it is relevant to the problem of finding the radius of convergence of a 
power series. Inferior and superior limits have many other important uses in 
analysis. 

Consider an arbitrary sequence { x n }, without any assumption as to whether 
or not it is convergent. 

Definition . A sequence {x„} is said to cluster, or accumulate, at a point £ if every 
open interval centered at £ contains x n for infinitely many values of n. These 
points x n need not be distinct from 

Example 1. Let x n = 2 if n = 1, 4, 7, 10, . . ., x„ = 1 if n = 2, 5, 8, 11, . . x n = 2 
if n - 3, 6, 9, 12, , . .. This sequence clusters at the three points i 1, 2. 

Example 2 . Let x n = 1 + sin(nir/2). This sequence clusters at 0, 1, and 2. 

Example 3 . Let x n — (-1)” (1 + (1 In)). This sequence clusters at -1 and L 

If we compare the definition of a cluster point of {*„} with the definition of 
the limit of a convergent sequence, we see at once that if {x n } is convergent , with 
limit £, then £ is the sole cluster point of {*„}. 

If a sequence is bounded, it must have at least one cluster point. This is a 
corollary of Theorem V, §16.31, or it may be proved by an argument much like the 
one used in proving the Bolzano-Weierstrass theorem (§16.3). 

If a sequence is bounded above and has one or more cluster points, there is a 
cluster point farthest to the right, namely the least upper bound of all the 
cluster points. This cluster point farthest to the right is called the limit superior 
of the sequence; it is denoted by 

lim x n or lim sup x n . 

n->oc /!->« 

Likewise, if the sequence is bounded below and has one or more cluster points, 
the cluster point farthest to the left is called the limit inferior , and denoted by 

lim x n or lim inf x„. 

n-* 

We often drop the n 00 part of the notation, as a typographical convenience. 
Clearly we always have, in the case of a bounded sequence, 

lim x n ^ lim x n . (21.5-1) 

The sequence is convergent if and only if lim x n = lim in which case the limit of 
the sequence coincides with the inferior and superior limit. 

If the sequence has no upper bound, we say that 
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if there is no lower bound, we say that 

lim x n = 

n -»<* 

If x n -> + oo, we say that lim x n = lim x n = + oo, and if x n -> - we say that 
lim = lim x n = - oo. 

One of the main reasons for introducing the new concepts of limit superior 
and limit inferior at just this point in the text is that we can use them in the 
following theorem. 

THEOREM XII. The radius of convergence R of a power series 2 a n x n is given 

by 

^=Sin|a„| ,M . (21.5-2) 

The understanding is that R = + oo if the limit superior is 0, and R = 0 if the 
limit superior is +oo. 

Proof. We appeal to Cauchy’s root test (Theorem XVI, §19.4). Let u„ = a„x n . 
Then 

ns = ns ia„x"i i,n = ixi ns |a„i ,/n = H (21.5-3) 

where R is defined by (21.5-2). Here we have used the fact that, for any 
sequence {y„}, 

lim cy„ = c lim y n if c > 0. 

It follows from (21.5-3) and Cauchy’s root test that the series converges if 
|x|<R and diverges if |x|>R. This proves Theorem XII. We have tacitly 
assumed R to be finite and positive. The cases R = 0 or R = + 00 require special 
attention but present no difficulties. They are left to the student. 

Example 4. Find the radius of convergence of the series in (21.3-7). Here 
a n = 2 if n = 0, 3, 6, . . . and a n = - 1 for other values of n. Thus |a„| 1/n is either 2 1/n 
or l 1/n , depending on the value of n. We see that \a n \ lln is convergent, with limit 1, 
so that R = 1, by (21.5-2). Note that R cannot be found by Theorem IV in this 

case, for the successive values of for n =0, 1,2, . . ., are 2, 1, 2 , 2, l,i . . ., 

a n H I 

and the sequence of ratios has no limit. 

It is not difficult to prove that, if {a n } is any sequence of positive numbers, 

lim lim a)/" (21.5-4) 

a„ — 

and 

Iki a]!" S ilm ^ 
a„ 


(21.5-5) 
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It is a consequence of these inequalities that if lim^ a n +ila n exists, then 
lim n _»oc a„ n also exists, and the two limits are equal. This is sometimes useful. 

Example 5. Show that 


lim 


(n!) 


w 


= e. 


We take a n = — r. Then 
n ! 


(21.5-6) 


a n+ i _ (n + l) n+I n ! _ (n + l\ n /, , 1\" 

= { l + n) 

as we know from (1.62-5). Therefore also 

a _* e 
a " (nW e - 

This result is not so easy to show without using this method. The result (21.5-6) 
may also be established by using Stirling’s formula (see §22.8). 


EXERCISES 

1. Find the radius of convergence of 2 a n x n in each of the following cases: 

(a) a n =c y/ ", c>0. (d) a„ = n!/n". 

(b) a„ — [ 1 + (l/n)]" 2 . (e) a„ = (Vn)'". 

(c) a n = 2" if n is even, (f) a„ = n -v ". 
a„ = 3" if n is odd. 

2. If 2a n x" converges for all values of x, prove that |a„| 1/ ”->0. 

3. Prove Theorem VI, §21.2, by use of Theorem XII. 

4 . Find the radius of convergence of each of the following series: 

(a) 2 x" 2 /2". (d) 2 r"V, 0 < r < 1. 

(b) 2 n!x rt? . (e) 2r"x n2 ,0<r. 

(c) 2 n"x" 2 . (f) 2 n!x" 2 . 

5. If 2 a n x n has a finite positive radius of convergence, prove that the radius of 
convergence of 2 a„x" 2 is R = 1. 

6. Use the method of Example 5 to find lim ah ,n in each of the following cases: 

(a) a n = p, a positive constant. 

(b) a n = n. 


(c) a„ = 


1 • 3 • (2n - 1) 

2 • 4 • ■ 2n 


7. If {x„} is bounded above, show that lim x n = £ if and only if for each 6 >0 it is 
true that x„ <£ + e for all sufficiently large values of n, and ^ - e <x„ for infinitely many 
values of n. Devise and prove a similar statement about the limit inferior. 

8. Use the formulation in Exercise 7 to prove (21.5-4) and (21.5-5). 
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Suggestion: Use the relation 


flN+p _ An + 1 An+2 Qn+p 

An (In fliv+i fljv+p - 1 

Choose N suitably, depending on e, and then let p -> oo after getting an appropriate 
inequality. 


21.6 / REAL ANALYTIC FUNCTIONS 

The use of power series to represent functions is so important that it is 
convenient to have an adjective for functions which can be so represented. This 
adjective is the word “analytic.” There is, of course, a general meaning of this 
word, not solely mathematical, but we are now using the term “analytic” in a 
highly technical sense. 

Definition. Let f be a real-valued function of the real variable x , defined on the 
open interval a <x < b, and suppose that f possesses derivatives of all orders at 
each point of the interval . We say that f is analytic on the interval if , for each 
point x 0 in the interval , f(x ) is represented by the Taylor's series 

/(*) = 2 (X - x„r (21.6-D 

n= 0 n ! 

in some subinterval x 0 — h < x < x 0 + h, where the size of the positive number h may 
vary with the point jc 0 . 

Example 1. We see by (19.1-7) that the function e x is analytic on the whole 
x-axis. Likewise one may show that sinx and cos x are analytic on the whole 
x-axis. 

Example 2 . The function logx is analytic on the whole positive x-axis. To 
see this we proceed as follows: Suppose xo>0. Then 

x = Xo + X - *0 = Xo 0 + - X(| X ° ), 

logx = log X 0 + log^l + X — i — )■ 

Now use the series expansion (19.1-1), with — — “ in place of x. The result is 

Xo 

logx = log x„+i(-ir^^^- 

n = l n *0 

The Taylor’s series here is convergent if 

-1< — £1, or 0<xi 2x 0 . 

Xo 

Most of the basic elementary functions and the usual combinations of them 
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are analytic on any open interval on which they are defined. We say that f is 
analytic at a point x 0 if it is analytic in some open interval centered at x 0 . If f is 
not analytic at x 0 , but is analytic at all points near x 0 , we call x 0 an isolated 
singular point of /. 

Example 3 . The function 1/(1 -x 2 ) has x = 1 and x = -1 as isolated singular 
points. It is analytic everywhere else, as may be shown with the aid of the 
binomial series. 

If two functions f and g are each analytic on an interval a<x<b , the 
sum f(x) + g(x) and the product g(x)g(x) are each analytic on that interval. The 
quotient f(x)lg(x) is analytic at each point of the interval at which g(x) ¥=■ 0. 
Under appropriate conditions the formation of a composite function from 
analytic functions yields another analytic function. 

Example 4. The function e slnx is analytic for all values of x, and log(cosx) is 
analytic for values of x such that cosx >0. 

A function defined by a power series is analytic at each point in the interior 
of the interval of convergence of the power series. 

Some of the facts we have been stating may be proved rather easily with 
what we have learned in Chapter 19 and in the present chapter. The fact that a 
composite function of analytic functions is analytic is somewhat harder to prove. 
The analyticity of the quotient of two analytic functions is a special case of the 
result for composite functions. We omit these proofs. The theory of real analytic 
functions is much clarified and simplified by a study of the theory of analytic 
functions of a complex variable. 

It is instructive to know that a function can fail to be analytic at a certain 
point even though it is everywhere continuous, with a continuous derivative. The 
function |x| 3/2 is such a function, but it is not analytic at x = 0. This function does 
not have a second derivative at x = 0. Much more surprising is an example of a 
function which has continuous derivatives of all orders for all values of x, yet is 
not analytic at x = 0. The function 

/(x) = <r lfaJ if x^o, /( 0 ) = o 

is such a function. It is shown in Example 6, §4.5, that / and all its derivatives 
have the value 0 at x = 0. It follows at once that f(x) is not represented by its 
Taylor’s series about the point x = 0, for 

«x» = 0 
n! 

for all values of x, since / (rt) (0) = 0, whereas f(x) ^ 0 if x# 0. 

The following theorem about real analytic functions is interesting and useful: 


n =0 


THEOREM XIII. Suppose f(x) is defined and has derivatives of all orders when 
a < x < b, and suppose that f (n) (x) ^ 0 for n = 0, 1, 2, . . . when a < x <b. 
Then f is analytic at each point of the interval , and the Taylor's series about 
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the point x 0 converges to f(x) for each x such that |x — x 0 | < h , where h is the 
distance from x 0 to the nearer end-point of (a, b). 

We shall not give the proof of this theorem; the proof is easily deduced from 
the result stated in Exercise 3. 

EXERCISES 

1. What does Theorem XIII imply (a) about e xr t (b) about (l-x) _m if m >0 and 

x < 1? (c) about log( 1 - x) if x < 1? (d) about sin x + e x a if x ^ a? 

2. By induction, or otherwise, prove the formula 

n „k»(k)/A\ „n + l r 1 

= 4> 0,+,) 

assuming that <f> and all the derivatives occurring are continuous on the interval from 0 to 
x, inclusive. 

3. Assume that <f> and all its derivatives are defined and nonnegative in value when 
0 ^ x ^ r. Prove by the following steps that <f>(x) can be expanded by Maclaurin’s series if 
0 ^ x < r. 

(a) Use the formula in Exercise 2 to show that 

Jo T 

(b) Use the fact that <f> (n+2 \x) is a nonnegative function and the result in (a) to prove that 
the integral remainder term in the formula of Exercise 2 is not greater than (x/r) n+1 <£(r) if 
O^x <r. Now complete the proof of the main assertion. 

4. Use the result in Exercise 3 to show that tan x can be expanded by Maclaurin’s 
series if 0 ^ x < tt/ 2. What can you conclude about the expansion for x < 0, seeing that 
tan x is an odd function? 

1 T x 

5. What can you infer about log — ^ from Exercise 3? 


MISCELLANEOUS EXERCISES 

1. Prove the binomial coefficient relation 


©■♦Gy-’-cy-ff) 


using Theorem IX and the fact that (1 + x)”(l + x) n — (1 + x) 2 ”. 

2. Deduce the formula 

sin -1 x , 23 , 2-4 5 , 2 • 4 * 6 7 , 

= X + 3X + X + X + 


and obtain a series for |(sin 1 x) 2 . Is the latter series valid when x = 1? 

3. Deduce the formula 

(tan~'x) 2 x 2 , ux 4 . , i , kX 6 

2! = T -(1 + 3) 7 + (1 + 5 + 5) 6" ' 



21.6 


REAL ANALYTIC FUNCTIONS 


653 


4. Deduce the formulas 

- lo fl~ :t) -x + ( 1 + V+0 + l+i)x 3 + -, 

Prove the validity of the latter series when x = -1. (Use Euler’s constant.) 

5. Suppose that {*„} is a sequence with x n > 0. Let 

lim x n = A , lim — = B. 

n ~* x Xn 

Show that B = 1/A if A is finite and positive. Show also that B = 0 if A = +°°, and that 
B — if A = 0. 

6. Apply the results of the preceding exercise to show that (21.5-2) can be replaced 
by the formula 

R = lim 



22 / IMPROPER 
INTEGRALS 


22 / PRELIMINARY REMARKS 


In this chapter we shall study improper integrals systematically in somewhat the 
same way that we studied infinite series in Chapter 19. There are many analogies 
between the theory of improper integrals and the theory of infinite series. These 
analogies are seen most clearly in comparing 

f /(x) dx and J) a n . 

JO n=0 


In the integral we have a variable x , ranging continuously from 0 to <*>, while in 
the series we have a variable n, with the discrete range of values 0, 1, 2, ... . The 
typical function value f(x ) is the counterpart of the typical term a„, and the 
integration symbol fo ( ) dx is the counterpart of the summation symbol 2^= 0 . 

The counterpart of a partial sum XjUoajc is the “partial integral” 

f X f(t)dt. 

Jo 


Definition . By an improper integral of first kind we mean an integral 

Ff(x)dx (22-1) 

J a 

in which f(x) is defined when x ^ a and is integrable (in the sense of §18.1) over 
every finite interval [a, b]. The integral is defined as the limit 

lim f f(x) dx 

b-*<x J a 

if the limit exists . The integral (22-1) is then said to be convergent. If the limit 
does not exist , the integral is called divergent. In most of the applications and 
illustrations f(x) is continuous. The integrals 

|°°p. J* sin t 2 dt, j 0 x"e~ x dx (ns? 0) 

are all of first kind. 

Definition . By an improper integral of second kind we mean an integral 

f f(x)dx (22-2) 

J a 
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with finite limits , in which the failure to be an ordinary “ proper ” integral arises 
from the behavior of f(x) either as x a or as x -> b, but not both. Thus, if f(x) is 
integrable on [a, c] for each c such that a <c < b, but is not integrable on [a, b], 
we say that the integral (22-2) is improper at x = b. Sometimes we say that f(x) 
has a singularity at x-b. We then define the integral (22-2) as the limit 


lim 

c-*b~ 



dx 


if the limit exists. The terms convergent and divergent are applied to the integral 
according as the limit does or does not exist. Similar definitions are made for 
integrals of the second kind which are improper at the lower limit of integration. 


Examples . I 

J n 


dx 


0 Vl~ 



de 

Vcos 0 -\ 


f 1 log X 

ln(\~xf n 


dx are improper at the 


upper limits, and 


i: 




r 1 logx 

'o 1 + X 


dx 


are improper at the lower limits. 

As with infinite series, it is important to be able to test an improper integral 
for convergence or divergence. There are certain analogies between the tests for 
series and tests for integrals, which we shall point out as we proceed. In 
practice, however, we do not need as great a variety of tests for integrals as we 
do for series. 

Just as certain functions may be represented by infinite series whose terms 
depend on a variable, so certain functions may be represented by improper 
integrals whose integrands depend on a parameter. As examples, we cite 
particularly the gamma function T(jc), defined by 


F(*)= f 

Jo 


e~ f t 


X - 1 


dt. 


0< x. 


integrals of the form 

/(s) = f e' st F(t)dt, 

Jo 

which are known as Laplace transforms , and functions defined by integrals of 
either of the forms 

J— [ f(0 sin xt dt, J— f f(t) cos xt dt, 

i TT Jo * IT J 0 

which are Fourier transforms. Laplace and Fourier transforms are of great 
importance, both theoretically and practically. 
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22.1 / POSITIVE INTEGRANDS. INTEGRALS OF THE FIRST KIND 

It is convenient to begin by studying improper integrals with positive 
integrands, for the same reasons that make it useful to study series of positive 
terms before studying more general types of series. 

Let us start with an integral 

f mdt (22.1-1) 

J a 

of the first kind. This is convergent if the integral 

F(x) = [fO)dt (22.1-2) 

approaches a finite limit as otherwise it is divergent . Now let us suppose 

that /(t) ^ 0 when t ^ a. Usually we shall have f(t) > 0, but zero values for f(t) 
need not be ruled out here. Then it is clear that F(x ) does not decrease as x 
increases, for 

F(x 2 ) - F(x,) = P/(0 dtm 0 
Jx 1 

if a ^ x t <x 2 . There are now two possibilities: either F(x) is bounded above or it 
is not. If it is bounded above, there is some constant Af such that F(x) ^ Af for 
all the relevant values of x. If it is not bounded above, for each Af there will be 
some x such that Af <F(x), no matter how large Af is, and since F(x) never 
decreases as x increases, this means that F(jc)-> + ° o as In this case the 

integral is divergent. If F(x) is bounded, however, then F(x) approaches a finite 
limit as x -> oo, and in this case the integral is convergent. The fact that a function 
F(x) approaches a limit as if it is bounded and nondecreasing as x 

increases is analogous to a corresponding theorem about sequences (Theorem 
III, §2.7), and may be proved in an analogous way (see Exercise 9). We state our 
conclusion about the integral formally: 

THEOREM I. An integral (22.1-1) of the first kind with /(f) ^ 0 for all t is 
convergent if and only if there is a constant Af such that 

( X f(t) dt^M 
J a 

when x> a. The value of the improper integral is then not greater than Af. 

This theorem is the counterpart of Theorem I, §19.2. There is also a 
counterpart of Theorem II, §19.2; we call it the comparison test for integrals. 

THEOREM II. Let faf(x) dx and fbg(x) dx be two integrals of first kind with 
nonnegative integrands , and suppose that f(x)^g(x) for all values of x 
beyond a certain point x = c. Then if JT g(x) dx is convergent , so is 
faf(x) dx , and if faf(x) dx is divergent , so is fb g(x) dx. 
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For the proof we note that the convergence or divergence of the integrals is 
not affected if we replace both lower limits by x = c. We then have, if x > c, 

/*/(■ OdtsJ X g(t)dt; 

the conclusions of the theorem now follow at once from Theorem I. 


Example 1. The integral [ - -- is convergent, by comparison with the 

J o VI + x 5 

r°° dx 

integral j — 375 * which is convergent, as may be shown directly from the 


definition; for (1 + x 3 ) 1/2 < x 3/2 when x > 0 


Example 2. The integral f divergent, by comparison with the 

Jo + x ) 


(\+xy 

integral f t—~, which is divergent. 

Jo 1 + x 

To see the divergence of the second integral, note that 


f x dt 

Jo l + t 


log(l + x) —> 00 as x->oo m 


Now 


L-c — 1 

1 + JC (l + x T ) m 

when x> 0 , for this inequality is equivalent to 1 + x 3 <(1 + x) 3 = 
1 + 3x + 3x 2 + x 3 , which is obviously correct if x > 0. Thus the first integral must 
diverge, by Theorem II. 

To avoid troublesome details of working with inequalities in practice, it is 
often convenient to use the following theorem rather than to use the comparison 
test directly. 


THEOREM III. Suppose J7/(*) dx and fb g(x) dx are integrals of the first kind 
with positive integrands , and suppose that the limit 


lim 

X-»oc 


f(x) 

g(x) 


= L 


(22.1-3) 


exists (finite ) and is not zero. Then either both integrals are convergent , or 
both are divergent. 


This is proved in exactly the same manner as we proved its counterpart for 
series, Theorem III, §19.2. 

1*°" x 2 dx 

Example 3 . The integral =-4 —r is convergent. To prove this, observe 

J 1 2X — X + 1 

that for large values of x the integrand is about the size of l/(2x 2 ). More exactly, 
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taking 


we see that 


/(*>= 


2x -x + 1 



lim 

X >°C 


fix) _ 1 
g(x) 2 


Now the integral Jf (1 /jc 2 ) r/.v is convergent, and so the given integral is also, by 
Theorem III. 


It is convenient to state an additional theorem about what conclusion can be 
drawn if L = 0 or L = + in (22.1-3). 


THEOREM IV. For the integrals described in Theorem III suppose that the limit 
L = 0. Then if fbg(x)dx is convergent, so is faf(x)dx. Or, alternatively, 
suppose that L = + oo. Then if ft g(x) dx is divergent, so is JT/(x) dx . 


The proof comes directly from Theorem II. For, in the first case we must 
have f(x) <g(x) beyond a certain point, and in the second case it is clear that 
g(x)<f(x) beyond a certain point. Nevertheless, it is usually easier to find L 
than to deal directly with the inequalities. 


Example 4. The integral JT x a e x dx is convergent, no matter what real 
number a may be. We apply the limit test, using the fact that JT(l/x 2 ) dx is 
convergent. 


and 


X e a+2 — jc 

— —= x a z e x = — 
x e 


a +2 


.a+ 2 


lim — ^ 

x->+oc e 


= 0 , 


as we see by applying l’Hospital’s rule (compare with Example 5, §4.5). The 
convergence of the given integral now follows by Theorem IV. 

In using Theorems III and IV the student needs to have in mind a few simple 
standard integrals whose convergence or divergence is known. It is easily shown 
that J7(l/x p ) dx (where a >0) is convergent if p > 1 and divergent if p ^ 1. In a 
very large number of practical situations it will be found that the convergence or 
divergence of an integral faf(x) dx of the first kind with positive integrand can 
be settled by using Theorem III or Theorem IV with g(x)= l/x p , choosing an 
appropriate value of p as determined by trial or inspection. 


Example 5. Let f(x) = For large values of x this function is 


(x + 1) : 


comparable in value to 3x1 x 51 = 3 lx 31 . Thus, applying Theorem III with g(x) = 
r® 3x_7 

we see that - L rrm dx is convergent. 

Jo (x + 1) 
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Example 6 Consider the integral J ’ where p > 0. In this case, we 

know that log* increases more slowly than any positive power of x. We apply 
Theorem IV with f(x) = (logx)~ p , g(x) = jc - 1 . Then, using FHospitaPs rule, 


lim = lim 

X-*cc g(X) X-*°c 


x . 1 

(logx) p P0ogx) p 1 


(1/x) 


This is the same as 


*iSp (log x) p '' 

After a certain number of applications of FHospital’s rule we find that 


lim 

x->« 


f(x) 

g(x) 


= + 00 . 


(One must consider separately the cases in which p is or is not an integer.) 
Theorem IV then assures us that the given integral is divergent. 


The student will note that we have not developed any analogues of the ratio 
tests of §19.22. In the analogy between series and integrals there is no simple 
way of formulating a counterpart of a ratio test, because a typical value f(x) of 
the integrand has no immediate successor, in the sense that a n+i is the successor 
of a n . 

One other difference between infinite series and improper integrals of the 
first kind is worth noting. If a series is convergent, its typical term a n approaches 
zero as n -»». But if an integral faf(x) dx is convergent, it does not necessarily 
follow that /(x)-^O as See Exercise 8, and Example 2, §22.3. 


EXERCISES 

1. Test the following integrals for convergence or divergence, using Theorem II and 
the known facts about J" x~ p dx for a > 0. 


(a) f 

J i 


dx 


(b) 

(c) 


(\ + x)Vx 
x dx 




0 (1 + x) 2 (2 + Vx) 
x + 2 


dx. 



(Vx-l)Vx 3 -l 

2. Establish the facts about the values of the exponent p for which the integral 
dx 


x(x + 1) 
bli 

i xF is convergent ( a > 1). Then use either Theorem II or Theorem III to test the 

Ja x(logx) p 

following integrals, using the foregoing integral as a standard, with an appropriate value 
of p in each case. 


(a 


>/; 


dx 


Vx 2 + lflog(l+x)f 


(b) 


i 


(x + 1) dx 


(x 2 -2)(log|-l) 
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3. Find whether each of the following integrals is convergent or divergent: 


(a) f 

Jo 


(b) 

(c) 


(1 + X) 

r oc 


dx. 


dx. 


/; 


16 + x 


4 dx. 


» r 

<«> /: 

* /; 

« n 

(K)f 


*T x2 dx. 


x 2 e“ x2 dx. 


Vx 


a + *y 


tan x 


+ x" 


dx. 


dx. 


dx 


(i) f e V(1og x) 3 dx. 

Jo 

a) /; 

(k > r 

(■) l (f-tan-'x)dx. 

(m) L 


. 1 
sin- 
x 


TT . — ! 

y - tan x 


7T . -1 

y - tan X 


logx 


2 Vx(logx) 3 
4. For what values of 


(n) 

(o) 

<p> f 


f oc cos 
J2/1T 

/ 4 (x- 


(!-;) 


dx. 

dx 


1) log(x - 2) • log(log x) 
dx 


Vl + X 2 log x[log(log x)] : 


x“ _1 

a is J — dx convergent? 


5. Show that jo x n e x dx is convergent if niO. Find the value of the integral if 
n = 2m + 1, where m is a nonnegative integer. 

6. Find the conditions on m and n which guarantee that JT e~*x m (log x) n dx is a 
convergent integral of the first kind. 


7. Let P(x) and Q(x) be polynomials of degrees m and n, respectively, and suppose 
that r is the largest real root of the equation Q(x) = 0. What is the necessary and sufficient 

P( X ) 

condition on m and n to make the integral J -^~dx converge, if r<c? Why is the 

integrand of constant sign if x is large enough? 

8. Suppose 0 < a n < v and b n > 0, n = 0, 1, 2, Define a function /(x) for x ^ 0 as 

follows: /(n)=h„, n = 0, 1, 2, . . . , /(x) = 0 if n - 1 + a n = x ^ n - a n +i and n = 

1,2,..., f(x) continuous for all x ^ 0, and a linear function on each of the intervals where 
it has not already been defined, (a) Sketch the graph of the function enough to show its 
general character, (b) Show that jo /(x) dx is convergent if and only if the series 
2n = 1 a n +ib n is convergent, (c) Specialize the sequences {a„} and {b„} so as to obtain a 
convergent integral and yet have b n This shows that the integral may converge even 
though /(x) is unbounded. 

9. Suppose F(x) is defined when x^x 0 , and that: (a) F(x)<M for all such x, 

where M is a constant; (b) F(xi)^F(x 2 ) if x 0 ^Xi<x 2 . Prove that limx-^=F(x) = A, 
where A is the least upper bound of the values of F(x). 

10. Prove the following theorem: If fix) > 0, if fix) decreases steadily as x increases, 
and if fa fix) dx is convergent, then lim,^<»x/(x) = 0. Suggestion: First prove that 
ffnfit) dt 0 as x-*a>. Then use (18.1-3). 
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22.11 / INTEGRALS OF THE SECOND KIND 


Let us consider integrals of the second kind with positive (or nonnegative) 
integrands. Theorems I-IV have exact analogues whose wordings differ but 
slightly from the statements of these theorems given in §22.1. Suppose, for 
example, that we are dealing with integrals improper at the upper limit, 

b 

f(t)dt (b > a) (22.11-1) 


i 


with f(t ) 3 0. Then, for a Sx < b, 

F(x)= P/(0 

J a 


dt 


does not decrease as x increases, and the integral (22.11-1) is convergent if and 
only if F(x) is bounded; in which case the value of the integral is lim*^*, F(x). 
This result enables us to prove a comparison test strictly parallel to Theorem II, 
from which in turn we deduce limit tests in which the convergence or divergence 
of two integrals of the same type, 


[ f(x) dx, [ g(x)dx 

J a J a 


are related by an examination of the limit 

fix) 


lim , x 
x^b- g(x) 


Entirely similar results obtain for integrals of the second kind improper at the 
lower limit of integration. 

For integrals of the second kind the basic standard reference integrals are 


f b dx 
Ja ib-xY 


f 


dx 


(x - a) 1 


(22.11-2) 


These integrals are convergent if p < 1, and divergent if p ^ 1. If p ^ 0 they are 
proper integrals, with no singularities of the integrands. 

i 


Example 1. The integral f 

Jo 

f(x) = 
g(x) = 


dx 


0 - * ) 
1 


3 TTT 73 is convergent. For, let 


1 


(l-x 3 ) 1/3 -(l-x) V3 (l + x+xW 
1 


We have 


(l-x) ,,v 
,■ fix) _ .. 1 

x™ g (X ) x-T 1 (1+X +xV 3 




Consequently, since fog(x)dx is of the type (22.11-2) with p<l, the given 
integral is convergent (by the counterpart of Theorem III, §22.1). 
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We often have occasion to recall the fact that sin x is approximately equal to 
x when x is small; more precisely, 


.. sin x t 
lim = 1. 

x->0 X 


Example 2, For what values of p 


r tt/2 

is 

Jo 


sm x 


dx convergent? 


Here there is a singularity at x = 0 if p > 1; if p = 1 the integrand is bounded 
and the integral is proper. We write 


sin x sin x 


1 

p-r 


and take 


so that 


sinx , v 1 
f(x) = ~JT’ S(.x) = p=T’ 






g(x) 


f 77 2 dx f sin x 

The integral — prp and therefore also the integral — dx, is convergent 

Jo x Jo x 

if p - 1 < 1, or p < 2. Both integrals are divergent if p ^ 2. 

It is worth observing that an integral of second kind can be transformed into 
an integral of first kind by a simple substitution. For an integral fa /(x) dx with 
singularity at x = b we can let 

1 

y 


, 1 , dy 

. > x = b > dx~ -4 

b-x y y 


p dy 

As x -» b , y -> + oo s and we get an integral of the form J </>(y) -yj with c = 
ib-a)\ 

Example 3. Transform J log ^ 3 —^ dx. 


We set y = (1 - x) 1 and obtain 


Jo' ,og (r^x) dx = I, L °7^ y - 


This integral of first kind is convergent, as may be verified by the limit test of 
Theorem IV, with g(y) = y~ 312 . The convergence of the original integral could be 
established directly by a limit test with g(x) = (1 - x)~ 1/2 . We leave it for the 
student to verify by l’Hospital’s rule that 


lim ' 0g ^^ 
!lr (1 -*)-'* 


0. 


Other changes of variable may also be used. 
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Example 4 . Show that 

(log!)’ *»['*-*. (22.11-3) 

Here we set t = log(l/w), or u = du = - e~ f dt. As w-^0, f-* + oo 5 and as 
m-» 1, 1 — > 0, whence the result follows as stated. The integral on the right in 
(22.11-3) has already been proved convergent (Example 4, §22.1). 


EXERCISES 

1. Determine the convergence or divergence of each of the following integrals by 
comparison with an appropriate one of the integrals 


[ x p dx, f (b -x) q dx , 
Jo Jo 


using the known facts as to the convergence or divergence of these latter integrals. 
,, f' (l + 2x)Vl + x* . ... r /2 Vx dx 

3 Jo 1 -x 2 x ' Jo (x + sin x)(l + x 2 ) 

( l xdx , . f 1 dx 

(b) Jo 

(c) f 


‘o Vl-x 

4(8-x 3 ) 

(2x-xY d 


(6) /„ Vx(x + 2x 2 ) m 

«>£<s+ x ) 


1/2 1/3 

r~2dx. 


X ~2X 


2. Test the following integrals for convergence of divergence by methods analogous 
to those of Theorems III or IV, §22.1. 


"■>/. 


dx 


V(x — 1)(3 — x) 
10 dx 


(c) 

(d) 

(e) 


xVx*- 1 


f 2 x dx 

Jo (16 -x 4 ) 1/3 

f l dx 

Jo x 2/3 (l + x) 


f 5 V25-V J 

7 dx. 

J 3 x - x - 6 


(g) f 

J o 

(h) f 

Jo 

0 >f 

J o 

»>/, 


(sin x) 


dx. 


x 3 dx 


sin(x 2 )(tan x) 3 
Vx i -2x i -4x + 8 


dx. 


logx 


Vx 
2 Vx 
logx 


dx. 


dx. 


<*> / 

J o 


1 x 3 sin 1 x 


Vl- 


dx. 


( 1 ) 




1/2 


log(l + x ,/3 ) 


dx. 


sin x 


3. Give necessary and sufficient conditions on p and q for J (sinjc ") ^ ' t0 a 

convergent improper integral of second kind. 

J ' t x a ^ l + x _a 

— — dx a proper integral? For what 

o l + x 

values of a is the integral improper, but convergent? 

5. Consider 

J\ 1 - x 2 )- ,,2 (l - X 3 )-' 13 • • • (1 - x n r lln dx, n a 2. 

Find all values of n for which the integral is convergent. 
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6. For what values of x is fo e~ f t x ~ l dt (a) proper? (b) improper, but con- 
vergent? 

f 1 (-log x) p 

7. Show that dx is convergent if p ^ 0 and 0 < q < 1. 

Jo x 

8. For what values of p is J7 (logx)~ p dx improper, but convergent? Assume a > 1. 

9. Prove that fo 12 log(sin x) dx is convergent. What can you say about 
Jo 12 log(cos x) dx? 


10. If /(m) is continuous, O^u^l, show that 


/: 


f(u) 


du is convergent. Show 


Vl-u 2 

also that the substitution u = sin 0 transforms the given integral into a proper integral. 

11. By a suitable change of variable transform fi x~ n e~ Vx dx into an improper 
integral of first kind, and show that it is convergent for all values of n . 

f 1 x p 

12. In the integral J dx, use the power-series expansion of log(l + x) to 

show that the integral is proper if p ^ q, improper but convergent if p <q<p + 1, and 
divergent if q ^ p + 1. 


22.12 / INTEGRALS OF MIXED TYPE 

Many improper integrals occurring in practice are of mixed type. 

r x dx 

Example 1. Consider I This is of mixed type, with infinite upper 

J 0 (1 + x)Vx 

limit, and a singularity of the integrand at the lower limit. There are no other 
singularities, so we consider the separate integrals 

f l dx r dx 

J 0 ( l + x)Vx’ J, (l + x)Vx’ 


of second and first kinds, respectively. The first integral is convergent, as may be 

I* 1 dx 

shown by using a limit test to compare it with the convergent integral I — j=- 

Jo Vx 

The second integral is also convergent, since the integrand behaves essentially 
like x~ m as x oo. We then write 


r dx 

r dx 

r dx 

J 0 (l + x)Vx 

J 0 (l + x)Vx j 

1, (l+x)Vx 


The choice of x = 1 as a breaking point is arbitrary. Any other positive value of x 
could have been used. 

When an integral of mixed type is separated into its constituent “pure-type” 
parts, it is called divergent if any one of the constituents is divergent. If 
singularities occur within the interval of integration, or at both ends of a finite 
interval of integration, the integral must be separated into several integrals, each 
of which is a pure type of either first or second kind. 

Integrals with -“as a limit of integration may be treated by methods parallel 
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to those of §22.1, or may be reduced to integrals with +oo as a limit of 
integration, by the substitution x = - u. 


Example 2. Consider 



x dx 
e x + x 4 


■ We separate this into 



xdx 
e x + x 4 


and 



x dx 
e x + x 4 


The integral ( b ) is convergent, since 


e x +x 


4 < xe 


if x >0, and fo xe' x dx is convergent (Example 4, §22.1). The integral (a) is also 
convergent; for, as x -» - oo, e x 0, and the integrand behaves like x" 3 . 

One may set x = - m, and thus get 


f 0 x dx _ f 0 udu _ f x u du 
J- OD e x + x 4 J x e u -hu 4 Jo e ' U j r u A 


In the transformed integral the integrand behaves like u~ 3 as u-» + co. The 
original integral has thus been shown to be the sum of two convergent integrals 
of first kind. 


EXERCISES 

1 . Examine each of the following integrals as to convergence or divergence, giving a 
complete analysis of the convergence or divergence of each of the constituent pure types. 


(a) f 
Jo 


dx 


xVl+x ; 


dx 


1 VxVl + x 5 

(c) J e~ x2 dx. 

(d) j x 2 e |jc| dx. 

(e) l 


o X ,I2 (X-1 ) 4 ' 3 

2 d* . 

1/3 


« J 

•'o 

w I] 
r dx 

1 Jo (cos x)^ 3 


dx 


(x- l) ,/ 2 (3-x) 273 ' 


<■> rs.. 


logx 


n<a 

• «J 

r r z/J 

(t+x'y~" w Jo l + t 


dx 


375* 


(sin x)' 

2. Proceed as directed in Exercise 1 with each of the following integrals: 

dog *) 2 j. ^ r (*-*-1? 

2/3 

dt. 


2 dx. 

5^5 dx. (e) 


dx. 


(b) f 

Jo 

. , //A 

i-oc — x i ^ 

(c) ”■ dx. (f) — “dx. 

J 0 xvx J-ocSinh'nx 


sinh ttx 

3. In each of the following integrals the integrand contains a parameter. For each 
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integral find the range of values of the parameter such that the integral is not divergent. 


(a) 

fl>) 


I fn* ® /. ('»';) 

<‘>L'(£r <s> 

( ' i, /. TTTT?*' (l " /. (t'Itt s ')* 

4. Where must the point (x, y) lie in the xy-plane if the integral - ^ 7 - - - -■ is to be 

Jo t (1 t t ) 

convergent? 

r°° e ~ xu _ 

5. Answer the question of Exercise 4 for the integral — 77 - — r^r- du. 

Jo w(l + e ) 

f “ - e ~ yu 

6 . Show that the integral — : - — u — du is convergent if 0 < x < y < 1. 

J — OQ l € 


22.2 / THE GAMMA FUNCTION 

One very interesting and important improper integral is the following, which 
defines what is knov/n as the gamma function : 

T(x)=f t x ~ l e~* dt. ( 22 . 2 - 1 ) 

Jo 

If x ^ 1 this is an integral of the first kind, and is convergent, by Example 4, 
§22.1. If x < 1 , however, the integral is of mixed type, with a singularity of the 
integrand at t = 0 , and we have to consider the integral 

f t x ~ x e~ l dt ( 22 . 2 - 2 ) 

Jo 

of second kind. Since e~* 1 as t -* 0 , it is clear that the integrand in (22.2-2) 

behaves like t x ~ l near t = 0. Now 



is convergent if l-x<l, i.e. if 0 <x, and divergent if 1 -x^l, i.e., if x^O. 
Therefore, by the analogue of Theorem III, §22.1, for integrals of second kind, 
( 22 . 2 - 2 ) is convergent if and only if x > 0 (it is improper only if x < 1 ). 

The integral from t = 1 to t = 00 is always convergent, as we saw in Example 
4, §22.1. The result is, then, that the integral (22.2-1) defining T(x) is convergent 
if and only if x > 0. Putting x = 1 we have 

[ e^dt-limf e~* dt = lim( - e~ T + 1 ) = 1 , 

Jo T-*°c Jo T->°c 
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or 

r(l) = 1. (22.2-3) 

There is a very simple relation between the values of the gamma function at 
x and x + 1. This relation is found by carrying out an integration by parts. We 
start with 


I\x + 1) = P 
Jo 


t x e~ l dt. 


Setting u = t x , dv = e~ x dt , we have du = xt* -1 dt, v - - e~\ 

J t x e dt = £ - t x e ~ % J +J xt x ' l e~* dt; 

letting T co, we see that 

[ t x e~* dt = -lim T x e r + 0 + ^c [ t x ~ x e~ % dt. 
J0 T-*<* Jo 

But 


(22.2-4) 


lim T x e~ T = 0, 


T x 

as we see by applying l’Hospital’s rule n times to where n is the first integer 
greater than or equal to x. Therefore, by (22.2-4), 

r(x + 1 ) = xr(x). (22.2-5) 

From this formula and (22<2-3) we have successively, 

T(2) = 1 • T( 1) = 1 
T(3) = 2 • T(2) = 21 
T( 4) = 3 • T(3) = 3-2*1 
T(5) = 4 • T(4) = 4 * 3 : 2 • 1 


In general we can write 


T(n + 1) = n ! (22.2-6) 

or 

T(n) = (n — 1)! (22.2-7) 

In the ordinary elementary sense n ! is defined only if n is a positive integer. 
But since T(jc) has been defined for every positive x, we see by (22.2-7) and 
(22.2-3) that it is natural to make the agreement that 0! = 1. This is customarily 
done. 

The gamma function gives us a convenient method of interpolating between 
the values of the factorials n!, and this is one of the primary reasons for the 
importance of the gamma function. Just now we shall take for granted that T(x) 
is a continuous function, though we can prove this later on (§22.5, following 
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Theorem VIII). In fact, T(x) has continuous derivatives of all orders, and is 
analytic. The derivatives are found by differentiating with respect to the 
parameter x under the integral sign in (22.2-1). Recall that 

t* = e*' 08 ', ■— (t x ) = log t ■ e x,ogl = t x log t. 

Thus 


T'(x) = f* t x ~' (log t)e~ l dt. (22.2-8) 

Jo 

This is the same procedure as that given in (18.5-2) for proper integrals 
dependent on a parameter. For improper integrals further justification is 
required; the problem is much the same as the problem of justifying the 
differentiation of a series term by term. We return to this problem systematically 
in Theorem X, §22.5; for the present let us proceed with our study of the gamma 
function. We can differentiate a second time, obtaining 

P(x) = P f x_1 (log tfe dt. (22.2-9) 

Jo 

The integrals for T'(x) and T"(x) are convergent integrals of mixed type, with 
singularities of the integrand at t = 0, and the same is true for the integrals giving 
all the higher derivatives (see Exercise 1). 

It is clear from (22.2-9) that T"(x) > 0, and therefore the curve y = T(x) is 
concave upward for all x>0. We also see that T(x)>0, r(l) = T(2)= 1. From 
these facts we see that T(x) has just one minimum value, and that this occurs for 
a value of x between 1 and 2. To see how F(x) behaves as x 0, we observe that 
if we integrate only from 0 to 1 in (22.2-1), the result is less than T(x). 
Furthermore, e~ l is a decreasing function, so that e~ l > e~ x if 0 ^ t < 1. Therefore 

r(x) > f ' t x ~'e~‘ dt > e~' f 1 t x ~' dt = — • 

Jo Jo ex 

It follows from this that T(x)-* + oo as x-*0 + . From the information which we 
have now collected it is possible to show the general character of T(x) on a 
graph. We leave it for the student to prepare such a graph for himself. 

The formula (22.2-1) does not define a function if x^O. Nevertheless we 
can define T(x) for certain negative values of x by using formula (22.2-5). If 
-1 <x <0, then 0 <x + 1, so that T(x + 1) has a meaning already defined. We 
then define T(x) by requiring 

F(x) = F( * + l) - (22.2-10) 

X 


Thus, for instance 


TH) = - 2T(|). 


Now suppose that — 2<x < — l; then — l<x + l<0, so that T(x + 1) is already 
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defined. We then define T(x) by (22.2-10), e.g., 

rH) = -frH). 

This process can evidently be continued, so that we obtain a definition of F(jc) 
for all values of x except 0,-1, -2, -3, . . . , and the equation (22.2-10) holds for 
all other values of x. 

It is easy to see that T(x) < 0 when - 1 < x < 0, and that F(x) -» - as x -» 0“ 
or x -> - 1 + . We leave it for the student to study the situation when -2 < x < - 1, 
-3<x <-2, and so on. A rough graph should be constructed. It will be shown 
later that r( 5 ) = Vtt (see (22.41-6)); from this we may calculate T(-i), etc. 


EXERCISES 

1. Show that Jo t x_, (log t) n e~ f dt is convergent for n = 1, 2, . . . if 0 <x. 

2. Prepare a graph of y = r(x), showing the general behavior of the gamma function 
for x > 0 and in the intervals — 1 < x < 0, —2 < x < — 1, etc. 

3. Show that F(x) = 2 Jo h 2x ~ 1 c~ m2 du if x > 0. 

4. Calculate the value in terms of V tt of 
(a) Jo x V x2 dx, (b) Jo x V x2 dx. 

5. If a >0, show that Jo x n ~'e~ ax dx = a~ n T(n). 

What is the implied restriction on n? 

6. Calculate in terms of Vir the values of 
(a) Jo x~ ll2 e~ 2x dx, (b) Jo x m e~ 4x dx. 

7. Show by (22.2-5) that, if n = 1, 2, . . . , 


r(n+!) = 


j-3-5 ••(2n-l) v - 
v *• ■ 


As a consequence show that 

VwF(2n + 1) = 2 2 "r(n + i)r(n + 1), 

and 

V^r(2n) = 2 2 "~T(n)r(n + 5). 

These formulas suggest the conjecture that perhaps 

V^r(2x)=2 2x -T(x)r(x + ^) 

not merely for x = n + 5 and x = n, where n is a positive integer, but for all x > 0. The 
conjecture is correct, as can be proved by later developments (see Exercise 8, §22.7). 


8. Show that 


1*3 - (2n-l) r(n+|) 


2*4* • *2n 


Virr(n-Kl) 

9. Derive the formula T(x) = (log--) du by putting u = e * in (20.2-1). Then 

set u~ v a , where a > 0, and so find the value of 

I'M’ ‘ v “~‘ dv ’ 


where x > 0. 
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10. Utilize the results of Exercise 9 to show that 


(a) f'l 

Jo 

JlogCl /»)) 

1/2 

| dt = V2ir, 

(b) f'l 

Jo 

f ‘ 1 


Uog(l/0/ 

1 dt - yj 3 


22.3 / ABSOLUTE CONVERGENCE 

An improper integral of first kind, faf(x)dx, is called absolutely convergent if 
the integral J“|/M| dx is convergent. Exactly the same definition is applied to 
integrals of second kind, and to integrals of mixed type. The switch from f(x ) to 
|/(x)| corresponds exactly to the switch from 2 a n to 2 |a„| in defining absolute 
convergence of the infinite series. If an integral is convergent, but not absolute- 
ly convergent, it is called conditionally convergent . 

The following theorem corresponds to Theorem IX, §19.3: 

THEOREM V. If the integral fa |/(x)| dx is convergent , so is fa f(x) dx . 

In other words, if an integral is absolutely convergent, it is convergent. 

Proof of the theorem. First of all we observe that 

0 = |/M| ~/M = 2|/(x)|. (22.3-1) 

Both parts of this double inequality may be checked by considering separately 
the cases when /(x)^0 and /(x)<0. Now let g(x) = |/(x)| ~ fix). Since 
J** |/(*)| dx is assumed to be convergent, the integral with 2|/(x)| as integrand is 
also convergent. Then, by (22.3-1) and Theorem II, §22.1, we see that J<TgM dx 
is convergent. But f(x) = |/(x)| - g(x), and therefore faf(x) dx is convergent, for 
sums and differences of convergent integrals are convergent, as may be seen 
directly from the definition of convergence. (What theorem about limits is used 
at this last step in the argument?) 

The theorem and its proof apply to integrals of the second kind; only the 
limits of integration have to be changed. 

To test whether an integral is absolutely convergent, we can apply the 
methods of §§22.1, 22.11, since the integrand |/(x)| is never negative. If an 
integral is conditionally convergent, the demonstration of its convergence is 
usually a more delicate matter. Many of the instances of practical importance 
can be handled by the following theorem, which is analogous to Dirichlet’s test 
for series (§19.7). 

THEOREM VI. Consider an improper integral of first kind of the form 

r 4>(t)f(t) dt, 

J a 


(22.3-2) 
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where the functions $ and f satisfy the conditions: 

(a) $'(0 is continuous , $'(0 = 0, and lim $(0 = 0, 

(->OC 

(b) /(f) is continuous, and the integral 

F(x)= f* f(t) dt (22.3-3) 

J a 

is bounded for all x ^ a. Then the integral (22.3-2) is convergent . 

Proof. We note that F'(x) = f(x). Therefore, integrating by parts and noting 
that F(a) = 0, we have 

r Mom dt = r <Ht)FV) dt = <t>(x)F(x) - r wmt) < 22 . 3 - 4 > 

J a J a J a 

Let us suppose that M is a bound for |F(x)|, that is, |F(x)|^=M. Then 
|$(x)F(x)| ^ |$(x)|Af, and so $(x)F(x)-» 0 as x-*o°, since $(x)~»0 by hypo- 
thesis. It then follows from (22.3-4) that (22.3-2) is convergent, provided we can 
show that the integral 

[ <t>'(t)F(t) dt (22.3-5) 

J a 

is convergent. This integral is in fact absolutely convergent. For, since $'(0 = 0, 
|<f>'(t)F(t)l = - *'(0|F(t)| S - M<f>'(t). 

It is then enough to show that 

f - M<t>'(t)dt (22.3-6) 

J a 

is convergent; then (22.3-5) will be absolutely convergent, by Theorem II, §22.1. 
Now 


[* - M$'( t) dt = - Af$(x) + M<f>(a) M$(a) 

J a 

as by condition (a). Thus (22.3-6) is convergent, and the proof is 

complete. 

Example 1. The integral fo (1/0 sin t dt is convergent. (There is no sin- 
gularity at t = 0; see the remark in Example 2, §22.11.) 

Here we take 


</>(0 = j, /(f) = sin t. 



sin t dt = 1 - cos x. 


Then 
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The conditions of Theorem VI are fulfilled, for (f>'(t) = - 1/t 2 and 0^F(x)^2. 
Therefore, the given integral is convergent. It is not absolutely convergent, 
however, as is not difficult to see (for a hint on this see Exercise 6). 


Example 2. The integral fo sin u 2 du is convergent. We make a change of 
variable, 


u = Vt, du = 


dt 

2 Vt 


Then 


f x . 2 A i f x2 sin t , 

sin u du = 2 — ~r~ r dt. 

K h Vt 


(22.3-7) 


J *oc ^ 

— j=r dt is convergent, by an application of Theorem VI very 
o Vt 

much the same as in Example 1. Hence, letting in (22.3-7), we see that 


f sin u 2 du = 2 [ 

j ft J ft 


sin t 

—p=rdt. 

Vt 


The integral of Example 2 illustrates the remark made at the end of §22.1, 
for sin u 2 does not approach zero as u ->oo. 


EXERCISES 


1 . Examine the convergence of each of the following integrals. Where possible, 
prove that the integral is absolutely convergent. If it is necessary to use Theorem VI, give 
details of the application of the theorem in the particular case. Proofs that an integral is 
not absolutely convergent need not be given. 



x cos x 
a 2 +x 2 


dx . 



cos x 

(HVX4 + X 1 ) 


cos X 

vmp 


dx. 


dx. 



dx. 


(j) 

f°° sin ax , 

Jo l + X* dX - 

<w 

f°° cosx , 

Li- X * dx - 

(1) 

f°° cos X , 

J-. .» * 

(m) 

f“sinx sin mx ^ 

Jo x 


( e) riog(logx) 
J 2 

oe 

® I 

« r 
« /: 


cos x dx 

logx 
x 2 - x + 2 

dx. 


: x 4 + lOx + 9 

x(x 2 + 1) sin x 
x 4 -x 2 + 1 
sin ttx , 


dx. 


x(x 2 -l) 
sin x 


... f" x sin : 

(,) Lt+7 


dx. 


o X 
sinx 


, ^ f e si 
. (n) 

Jo X 

(o) r 

Jo 

(p> I. 

« /: 

« /: 


sin x 


dx. 


dx. 


e - 1 
sin(sin x) cos x 


x + 1 . , 

— 575- sin x dx. 


dx. 


e x cos 2x dx. 



22.4 


IMPROPER MULTIPLE INTEGRALS. FINITE REGIONS 


673 


2. Show that 


sin ax . .. ~ , , 

— p — ax is convergent if 0 < p < 2, and absolutely convergent if 


1<P <2. 

J -°° | COS dX 

p dx is convergent if 1 <p <3. Is it absolutely convergent 

o x 

for any of these values of p ? 

4. Show that f - i 1 - — ( 1- — co - dx is convergent if 0<p <4. Is it absolutely con- 
J o x 

vergent for any of these values of p? 

5. Show that Jo cos(* 2 ) dx and /<T x cos(jc 4 ) dx are convergent. Note that the in- 
tegrand in the second integral is unbounded. 

6. Show that 


(n 4- l)7r 


n = 0, 1, . . 


Use this result to prove that 


dx is not absolutely convergent. 


22.4 / IMPROPER MULTIPLE INTEGRALS. FINITE REGIONS 

Consider first the case of an integral 

f f fix, y) dA, (22.4-1) 

R 

where R is a closed bounded region, / is continuous in R except at one point 
(*o, yo), and the behavior of / at that point is such that the function is not 
integrable over R in the sense of §18.6. The cases of greatest practical im- 
portance are those in which f(x , y) either becomes infinite or has a factor which 
becomes infinite as (jc, y)->(x 0 , yo), e.g., 

/(*. y) = j: or f(x, y) = * r r ° ’ 

where 

r = [(x - x 0 ) 2 + (y - yo) 2 ] 1 ' 2 . (22.4-2) 

To define what we mean by the convergence or divergence of the integral 
(22.4-1) we proceed as follows: Let JR' be a region derived 
from R by discarding a small region A R having the point 
(x 0 , y 0 ) in its interior ( R' is the shaded portion of R in Fig. 

181). No restriction is placed on the shape of AjR except 
that it be a Riemann region in the sense defined in §18.6. 

Of course, we also assume that R is a Riemann region. 

Let d be the maximum diameter of A R, that is, the distance 
between two points of A R which are as far apart as it is Fig . 181. 
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possible for them to be when both are in A R. We then consider the integral 

JJ fix, y ) dA 

R 1 

over the region R’.If this integral approaches a limiting value as d 0, the value of 
the limit is denoted by 


ff f(x, y ) dA = ljm ff f(x, y) dA (22.4-3) 

R R' 

and the integral (22.4-1) is said to be convergent. If the limit does not exist, the 
integral is called divergent. 

We shall confine our attention to the case when the integral of |/(x, y)| is 
convergent. In this case the integral of /(x, y) is also convergent, as may be 
proved in much the same way that we proved Theorem V, §22.3. The integral of 
/(x, y) is then said to be absolutely convergent. 

The comparison principle is valid in the following form for improper double 
integrals: If |/(x, y)| ^ g(x, y), and if ffg(x,y)dA is convergent, then 

ff f(x, y) dA is absolutely convergent. R 

If one is trying to determine whether or not the integral of a positive 
function is convergent, it is not necessary to consider regions A R of arbitrary 
shape in examining the limit (22.4-3). It is sufficient to confine one’s attention to 
regions of one particular shape, say circles (or, an ter natively, squares) with 
center at (x 0 , yo). Let 8 be the radius of such a circle, and let R f be the region 
which is obtained by deleting from R the interior of the circle in question. Then, 
if g(x, y)^0, and if the limit of ff g(x, y) dA exists as S->0, the integral of g 

R' 

over R will be convergent and will be equal to the limit just mentioned. Proof of 
this fact is indicated in Exercise 8. 

All the foregoing remarks apply without essential change to the case in 
which (x 0 , yo) is on the boundary of R instead of in the interior of R. The only 
difference is that we discard from R that part of R which is in a small region 
having (xo, yo) in its interior. 

Example 1. Let r be defined by (22.4-2) and consider the integral 

JJprdA, (22.4-4) 

R 

it being understood that (x 0 , yo) is a point of R and m > 0, so that the integral is 
improper. We shall show that it is convergent if m < 2. Since the only singularity 
is at (x 0 , yo), the typical difficulty is exposed in the case that R is a circle with 
center at (x 0 , yo), and of radius c. Let us delete a small concentric circle of radius 
8, so that R' is the annulus between these circles. In evaluating the integral we 
may as well assume that (x 0 , yo) is the origin, since the value of the integral is not 
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affected by the location of the axes. Then, with the use of polar co-ordinates, 
// £ " - f de [r-r dr~ ]. 

R’ 

Since m < 2 we see that 



R' 


2tt 
2- m 


c 


2— m 


Thus the integral (22.4-4) exists if R is a circle with center at (x 0 , yo) and m < 2. 
For regions of other shape, and for (x 0 , yo) located on the boundary of R , the 
difficulty can easily be resolved in terms of the case we have treated. 


Example 2. The integral 


\\ X ~^dA 

R 

is absolutely convergent; for |x - x 0 | ^ r, by (22.4-2), and so 


x -x 0 


r 


2 



(22.4-5) 


whence, by the comparison-test principle and the result of Example 1, the 
asserted result follows. 


Similar considerations apply to improper triple integrals. Improper multiple 
integrals in which the integrand has just one singular point in the region of 
integration occur typically in the theory of force fields governed by the inverse- 
square law, e.g., gravitational or electrostatic fields. From a purely mathematical 
point of view the study of such fields belongs to what is called potential theory. 
Let R be a bounded closed region in 3-space, and let it be filled with mass of 
density p(x, y, z). If 0 is the point (x 0 , yo, Zo), and 

r = [(x - xrf + iy - y„) 2 + (2 - Zo) 2 ] 1 ' 2 , 
the Newtonian potential at Q produced by the total mass is 


<Mx 0 , yo, zo) = f f f y dv, ( 22.4-6 ) 

R 

and the gravitational field at Q is a vector F whose first component (in the 
x -direction) is 

F, = || {„^dV, (22.4-7) 

R 

with similar formulas for F 2 and F 3 . If Q is a point of R these are improper 
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integrals; they are absolutely convergent if /x(x, y, z) is bounded and integrable. 
One can also discuss the potential and field due to distribution of mass on 
surfaces, and in particular on plane regions. If Q is a point in the plane region, 
the double integral defining the potential at Q is absolutely convergent, but the 
integrals defining the components of the field at Q are divergent except in very 
special cases. 


EXERCISES 

1- Let r be the circular region of radius 1 with center at the origin. Determine, for 
each of the following integrals, if it is convergent or divergent. Polar co-ordinates are 
denoted by r, 0. 

<•> JJb* fyyT* dA. (d) JJ log ~ dA. 

R R 


(b) //(PT7¥ dA - 



■ f sin -= 


(0 J 

J -pfdA. 

«»// 


2 + V 

+T 5 Y ndA - 

l R 

2. Let /(x, y) be continuous in R except at (x 0 , yo), and bounded. Let r 2 - 


(x -x 0 ) 2 + (y - yo) 2 . Show that J J ~ dA is absolutely convergent if m <2. 

R 

3. State and prove a result corresponding to that of Exercise 2, for improper triple 
integrals. 

4. If R is a closed and bounded plane region containing the origin, for what values 
of p is the integral 



R 


certainly absolutely convergent? 

5. Let R be the unit sphere x 2 + y 2 + z 2 ^ 1, and let r 2 = x 2 + y 2 + z 2 . Find the values 
of those among the following integrals which are convergent. If a literal constant appears, 
indicate the restrictions you place on it to insure convergence. 


<■>///; 

R 

« III 


dV. 


(d) 




r dV. 


<*> in 


x 2 + y 4 + z ( 


dV. 


(0 (f) /// 


* 2 y 2 z 2 AV 
- r iw dv . 
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6. (a) Is the integral J J y 2 ji£ convergent or divergent, where R is the 

R 

region jc 2 + y 2 S 1? 

(b) What if the exponent | is replaced by ml 


(c) For what values of 


“llwr- 


llx-l r+y 2 ] 


2 1P 


dA certainly convergent? 


R 

7. Let R , /, AR, d have the meanings used in the discussion of (22.4-3). 

(a) Let W be a subregion of R which contains (x 0 , yo) and all the points of R in some 
neighborhood of (x 0 , yo). Show that / / fix, y) dA is convergent if and only if 


R 


// fix, y) dA is convergent, and then 


J J fix, y) dA = JJ f(x, y) dA + J J f(x, y) dA, 

R R-W W 

where R - W is the region which results by removing W from R. 
Suggestion: Consider 


[ f fix, y)dA- f f f(x, y) dA, 

R-AR W-AR 

where A R is so small that it is contained in W. 

(b) Deduce from the result in (a) that lim ff fix, y) dA = 0 if // fix, y) dA is convergent. 

d—O AR R 

8. Suppose gix, y) is nonnegative and continuous in R except at (x 0 , y 0 ). Let { W n } be a 
sequence of subregions of the type of W in Exercise 7(a). Suppose W i contains W 2 , W 2 
contains W 3 , etc., and that d n -» 0 as n -> oo, where d n is the maximum diameter of W„. Finally, 
if R n - R - W n , assume that 



dA exists, = I. 


Prove that // g(x, y) dA is convergent, with value I. Suggestion: For a given m, choose 

R 

A R so small that it is contained in W m . Then choose n so that W n is contained in A R. 
Now show that 


JJ g(x,y)dA-I= JJ g(x,y)dA + JJg(x,y)dA-I, 

R- AR W m - AR R m 

and that 


J J gix , y )dA^JJ gix, y)dA-JJ g(x, y ) dA. 

Wm — AR Rn Rm 

From here it is easy to complete the proof. Write out the whole argument carefully. 
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9. Prove the validity of the comparison principle stated in the text. 

10. Suppose / f |/(x, y)| dA is convergent, (a) Let {W n } be a sequence of regions of 

R 

the type described in Exercise 8. By considering the inequality 0^ |/(x, y)| -/(x, y) ^ 
2|/(x, y)|, show that 


y) dA exists, = I, say. 


Rn 


(b) If A R is so small that it is contained in W n , show that 

| jj fix, y) <fA - 1 1 s j j \f(x, y)| dA + |JJ fix, y) dA- 1 


W„ 


and so deduce that / J /(x, y) dA = I. Note the result of Exercise 7(b). 

R 


22.41 / IMPROPER MULTIPLE INTEGRALS. INFINITE REGIONS 

In this section we are going to consider integrals of the type 

ff/(x,y)dA, (22.41-1) 

R 

where R is an unbounded region, such as the first quadrant (jc ^ 0, y ^ 0), an 
infinite strip (say between the lines y = 0, y = 1), or even the entire xy -plane. In 
any case we assume that the boundary of R is regular enough to cause us no 
trouble, and we tacitly assume the same about all other regions subsequently 
mentioned in this section. We also assume that / is integrable, in the ordinary 
proper sense, over any bounded closed subregion of R. Thus the only problem 
arises from the fact that R is an infinite region. 

We might proceed as follows: Let us take a sequence of concentric circles 
with centers at O and radii becoming infinite. Let {R n } be the sequence of 
regions obtained by considering the part of R inside or on the nth circle. Then 
define 


j j fix, y) dA = Jim J J fix, y) dA (22.41-2) 

R n X R n 

provided this limit exists. This seems like a reasonable procedure, as indeed it is, 
under suitable conditions. But, one may ask, why not use squares instead of 
circles? Or, why not regions more general than just circles and squares? Will one 
get the same limit in all cases? The answer is, perhaps not, unless some further 
restriction is placed on the function /. Suppose, however, that we assume 
/( x, y) ^ 0. Then it may be shown that, if the limit (22.4-2) exists when we use 
circles to get R n , it also exists and has the same value when we use squares or 
any other sequence of regions subject to reasonable restrictions. To understand 
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the essential principle of what is involved here, let {R n } be a sequence of regions 
formed by using circles as already described, and let {S n } be a sequence obtained 
in the same way, using squares (instead of circles) with center at O and length of 
side becoming infinite. Also, let 


and write 






dA, 


I = lim I n . 

n-»oc 

Since each region R n contains its predecessor R n ~i, and since /(x, y) = 0, it is 
clear that 


Likewise 


I\±= I 2 = h = - * ^ 1 . 


Now any one of the squares is contained in some one of the circles, and vice 
versa. Therefore, given n, there is some N such that 


and given m, there is some M such that 


Im ^ Jm- 


These inequalities show that the sequences {!„} and {J n } both have the same limit. 

These arguments show that it is sufficient to define the convergence of the 
integral (22.41-1) in the manner already indicated if /(x, y) ^ 0. 

We define absolute convergence of (22.41-1) in the usual way. It is then easy 
to see that the definition (22.41-2) is satisfactory for absolutely convergent 
integrals. This is all we shall deal with. In practice we shall use regions R n 
obtained by using squares, circles, or other regions as convenience dictates. 

Example 1. If R is the entire xy-plane, and a >0, 


f f (x 2 +y 2 +a 2 r 3,2 dA = ~ 


(22.41-3) 


Using polar co-ordinates, and letting R' denote an arbitrary circular region 
x 2 + y 2 ^ c 2 , we have 

JJ(x 2 +y 2 + a 2 ) 3/2 dA = j dO j (r 2 + a 2 ) _3/2 r dr 


R’ 


= 2irj— (c 2 + a 2 )~ 112 
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Letting c-» <», we see that there is a limit, so that the integral over all of R is 
convergent, and given by (22.41-3). 

Example 2. If R is the entire first quadrant in the xy -plane, 


J J e~ x2 - y 2 dA = ~ (22.41-4) 

R 

To obtain this result, let R' be the part of R in the circle x 2 + y 2 ^ c 2 (see Fig. 
182). 


y 



y 

(o,4 


R" 


°\ 

Fig. 183. 


— 1 — X 

(C, 0 ) 


Using polar co-ordinates, we have 



Hence 


jje~ x2 ~ y2 dA = hmffe-* 2 - y2 dA = f 

R R' 

Let us consider what happens if we attempt to evaluate the integral in 
Example 2 by using squares instead of circles in the limiting process. Let R " be 
the part of R inside the square with sides along the lines x = 0, x = c, y = 0, y = c 
(see Fig. 183). Then 


J j e~ x2 - y7 dA = dx jj e~ x2 ~ y2 dy 

At the last step we have changed y to x under the integral sign, this being 
permitted since the value of the integral does not depend on the variable of 
integration. 
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Now we know from the earlier discussion that the value of the improper 
double integral is given just as well by the limiting process with squares as by 
that with circles. Therefore, in view of (22.41-4), we have 


or 



(22.41-5) 


This last result probably appears to the student as an unexpected by-product of 
our study of improper double integrals. It is a very important result, neverthe- 
less. In fact, it was very largely to get this result that we developed this 
particular section of the text. Our immediate interest in the integral (22.41-5) is 
because of its connection with the gamma function. This connection is worked 
out in the next paragraph, by a change of variable. But the integral (22.41-5) is 
also of interest in statistics and elsewhere. 

In (22.41-5) let us make the substitution 

x = * 1/2 , dx = \r m dt. 

Then 



by (22.2-1). Thus we have 

Hj) = Vv. (22.41-6) 


EXERCISES 

1. Let the xy -plane be given a distribution of static electricity of constant density a 
per unit area. Show that the resulting field at the point (0, 0, a) on the positive z-axis is in 
the z-direction and of magnitude 2mr. Note that the result is independent of the distance 
a. The field at a point is the force which would be exerted on a unit positive charge there. 

2. The electrostatic potential at (0, 0, a) resulting from a charge of constant density a 
on a bounded region R in the xy -plane is 


u 


ad A 

(x 2 + y 2 + a 2 ) 112 ’ 


Would this integral be convergent if R were taken to be the whole xy -plane instead of a 
bounded region? 

3. Suppose that, in Exercise 1, the charge is placed only on the strip of the xy-plane 
between the lines x = ± b, where b >0. Show that the electrostatic field at (0,0, a) is 
4tan _1 (b/a). One method of solving calls for the formula 


f dx _ 1 / A \ 1/2 _-ir /BC-AD\ 1/2 1 * 

J (A+TiFHcTm*)™ a\bc-ad) tan rLfc + Dx 2 }) J + Const -> 


which is not found in all integral tables. 
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4. Is the integral J J — ^ 2+ y2 convergent, if R is the first quadrant of the xy-plane? 

R 

5. Show that, if R is the first quadrant of the xy-plane, the integral ff ye~ y2il+x2) dA is 

R 

convergent. One method is to integrate over the square 0 ^ x ^ a, O^y^a, and study 
what happens as a -> oo. It is possible to show that the integral over the square is equal to 
d€~ a2 f a2 B~* 2 dt 

\ tan -1 a — I h -r From this one can find the value of the double integral over R. 

2 Jo a + 1 

6 . If the integrals Jo fix) dx and /<T g(y) dy are convergent, and if fix) and g(y) are 
never negative, show that the double integral // f(x)g( y) dA, over the entire first 

R 

quadrant, is convergent and equal to the product 

(L f(x)dx XL g(y)dy )- 

This conclusion may not be correct if either or both of the functions f(x), g(y) 
can assume both positive and negative values. An illustration is afforded by 
Jo (1 lx) sin x dx and Jo e~ y dy. If we attempt to compute the double integral using squares 
0 ^ x g a, 0 i y g a, we obtain the product of the two integrals as the limit when a -> °°. But if 
we integrate over the region R n defined as we shall indicate, it can be shown by estimation of 
the integrals over the blocks composing R n that the double integral over R n is greater than 

7 — 77 ^ r log - 1 + c n , where c n ~>~^r as hence the integral tends to +°°. The 

log(logn) 3 2 

blocks composing R n are kir ^ x ^ (k + l)7r, 0 ^ y ^ log n if k = 0, 2, . . . , 2n and kir^x^ 
(k + l)n, 0 ^ y ^ log[log(log n)] if k = 1, 3, . . . , 2n - 1. 

22.5 / FUNCTIONS DEFINED BY IMPROPER INTEGRALS 

There are many important problems in analysis in which one encounters im- 
proper integrals depending on a parameter; if such an integral is convergent, it 
defines a function of the parameter. A typical situation is that of the integral 
(22.2-1) defining the gamma function. As a general type of problem suppose we 
have 


F(x) = J” f(t, x) dt, (22.5-1) 

where the integral is of first kind and convergent for each value of x in some 
interval. In practice it is important to know whether or not F is continuous; also, 
under certain conditions it is possible to deal with integration and differentiation 
of F(x ) according to the formulas 

p w-f s* 

f F(x) dx = f dt f f(t , x) dx. 

Ja Jc Ja 

It is important to know when these formulas are legitimate. The treatment of 
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these questions is quite analogous to the treatment of the corresponding ques- 
tions in the case of functions defined by infinite series. The key concept is that 
of uniform convergence , just as it was in Chapter 20. 

Definition . Suppose that the integral (22.5-1) is convergent for each x in the 
interval a^x^b. We say that it is uniformly convergent on that interval if to 
each e > 0 corresponds a number So, depending ( possibly ) on e, but not on x, 
such that 


F(x)-J'f(t,x) dt 


<€ 


whenever a^x^b and s 0 ^ s. 


This definition should be compared with the definition of uniform con- 
vergence of an infinite sequence of functions, in §20.1. The /„(*) of that 
definition is analogous to /<? f(t, x) dt in our present definition, and n is analo- 
gous to s. The parallel between Chapter 20 and our present work is so close that 
we shall in the main merely state the important theorems and omit the proofs. A 
student who understands Chapter 20 thoroughly will have little additional 
difficulty in appreciating the present section. He must also be familiar with §18.5, 
in which similar problems for proper integrals depending on a parameter are 
treated (see particularly Theorems XIII and XIV of §18.5). 


THEOREM VII. Let f(t , x) be continuous in the two variables t> x when c ^ t 
and a^x^b, and let the integral 


i, 


F(x)= /(f, x) dt 


(22.5-2) 


be uniformly convergent on the interval [a, b]. Then F is continuous on the 
interval . 


This corresponds to Theorem III, §20.3. 

There is a convenient practical test for uniform convergence, analogous to 
the M-test for series (Theorem II, §20.2). It is the following: 

THEOREM VIII. Suppose g(t) ^ 0, and that the integral of first kind 

J>>* 

is convergent. Also suppose |/(f, jc)| ^ g(t) when a^x ^ b, for all values of t 
beyond some fixed value (t 0 = O- Then the integral (22.5-2) is uniformly 
convergent on the interval [a, b]. 

There are analogous theorems for improper integrals of the second kind, 
with the obvious modifications in notation. 
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As an application, we can now prove that F(x) is continuous when x > 0. We 
write 

T(x) = f * t x -'e'' dt + J t x ~’e~ l dt. (22.5-3) 

We prove that T(x) is continuous on every interval a^kx^kb with 0 < a < b. 
Having fixed a and b, we observe that, if a^x^b, and t ^ 1, 0 < t x ~ x e~ x ^ 
t b ~ y e~\ Since 


J” t b ~ l e~' dt 

is convergent, the uniform convergence of the second integral in (22.5-3) follows 
by Theorem VIII. If x > 1, the first integral in (22.5-3) is proper, and is a 
continuous function of x by Theorem XIII, §18.5. When the integral is improper, 
however, we use the theorem corresponding to Theorem VIII for integrals of 
second kind, noting that 

0 < t x ~ l e~* ^ 


if 0 < t ^ 1 and 0 < a ^ x ^ b, and that 




r-'e-'dt 


is convergent. Thus T(x), being the sum of two continuous functions, is con- 
tinuous. 


THEOREM IX. Under the hypothesis of Theorem VII it is true that 

[” F(x)dx= Y dt [ b f(t,x)dx. 


(22.5-4) 


This corresponds to Theorem IV, §20.4. 

THEOREM X. Let the integral 

F(x) = [f(t,x)dt 

df 

be convergent when a ^ x ^ b. Let the partial derivative be continuous in the 

oX 

f°° 

two variables t, x when c ^k t and a ^ x ^ b, and let the integral J dt 
converge uniformly on [a, b]. Then F(x) has a derivative given by 

F'(x) = r^d, (22.5-5) 

J c ^X 

We shall prove this theorem, which corresponds to Theorem V, §20.5. We 
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set 


Applying Theorem IX, we have 

= f X [f(t,x)-f(t, a)]dt 
= F(x)-F(a). 

Since G is a continuous function (by Theorem VII), we know from the last 
formula that 


F'(x) = G(x) 

(Theorem VII, §1.52). This result is equivalent to (22.5-5). 

With Theorem X we can justify the formula (22.2-8) for T'(jc), for the 
integrals 

J t x_1 (log t)e~ l dt, J t x_1 (log t)e dt 

are uniformly convergent, as may be shown by arguments very similar to those 
employed in connection with (22.5-3). The procedure extends at once to the 
higher derivatives, so that, for n = 1,2,... and x > 0 

r (n) (*)=f t x-1 (log t) n e~ l dt. (22.5-6) 

Jo 

The theorems of this section have applications in justifying the steps in 
various ingenious methods by which one can sometimes find the value of an 
improper integral. 


Example 1. Show that if 0 < a < b, 

r 00 _ -hi l 

— dt = log— • (22.5-7) 

JO * a 

The integral is of first kind, for the integrand approaches the finite limit b - a 
as 0. The ingenious device here consists in starting from the formula 

1= (22.5-8) 

x Jo 

which is valid if x >0; this formula is obtained by direct evaluation, using the 
indefinite integral 

j e- x, dt = -^e~ x, + C. 
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The integral (22.5-8) is uniformly convergent on the interval a^x^b, by 
Theorem VIII and the fact that e' xt ^ e' at when t ^ 0. Hence, by (22.5-4), 

rw 

Ja X Jo Ja 

When we calculate the integral on the left and the inner integral on the right, we 
obtain (22.5-7). 


Example 2 . Show that, if a > 0, 


f“ e at sin xt 

>o t 


dt = tan 1 — ■ 
a 


(22.5-9) 


We denote the integral here by F(x). We are justified in calculating F'(x) by 
Theorem X, getting 

F'(x) = f e~ at cos xt dt, (22.5-10) 

Jo 

for this latter integral is easily seen to converge uniformly for all values of x. 
Now, by an elementary integration formula 

[ s - at ,, re -a '(jc sin xt - a cos xt)l s 

Jo e COSxdt = l TO -Jo’ 


and hence, as s we have, after an easy calculation, 

f e~ at cos xt dt = — 2-7 — 2 ' (22.5-11) 

Jo a + x 

Therefore, by integration, taking account of (22.5-10), 

F(x) = tan -1 C. 

To determine the constant of integration C we observe that F(0) = 0 by the 
definition of F(x). Since tan 1 0 = 0 also, we conclude that C = 0 . This completes 
the derivation of (22.5-9). 

Example 3. Show that, if x > 0, 

rswjU d( = n ( 22 . 5 - 12 ) 

Jo t 2 

This result is deduced from (22.5-9). We consider the integral in the latter 
formula as a function of the parameter a, writing 

G(a) =fe^nxL dt 

Jo t 

Note that G(a) = tan ~ l (x/a) if a >0, while 

G( 0 )= r*iMt dt 

Jo t 
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We regard x as fixed. This last integral is convergent, by Theorem VI, §22.3. 
Now it can be shown that the integral defining G(a) is uniformly convergent when 
a ^ 0, so that G is continuous for such values of a, by Theorem VII. Then 
G(a)-+G(0) as a-»0 + . Now 


lim G(a) = lim tan 1 — = — ■ 

a~>0+ a-»0+ Cl Z 

if x >0. Thus (22.5-12) is established. The assertion about uniform convergence 
is discussed in Exercise 20. 

It is easily shown that the integral in (22.5-12) has the value —irl 2 if x <0, for 
the integral defines an odd function of x. Thus 


r sin xt 
o t 


dt = 


j if x>0, 
0 if x = 0, 


-j if x <0. 


(22.5-13) 


From this it is clear that the function defined by the integral is discontinuous at 
x = 0. It must therefore fail to be uniformly convergent in any closed interval 
which contains x = 0. 


EXERCISES 

1. 1. (a) Let F(x) = /<T e~ t2 cos xt dt. Assume the applicability of Theorem X, and 
show that F'(x ) = - |xF(x). Then find F(x). 

(b) By change of variable in the result of (a) show that 

f e~ a,2 cos xt dt = \J— e~ x2/4a , a > 0. 

Jo 2 V a 


(c) By suitable use of Theorem VIII, show that the integral 


JV- 


,2 sin xt dt 


is convergent uniformly with respect to x for all values of x, thus justifying the procedure 
used in (a). 

2. (a) Show that, for all x, 

f“ e~' 2 ~ (x2/,2> dt = — ^ e ~ 2W , 

Jo ^ 

by denoting the integral by F(x) and showing that F'(x) = -2F(x) when x > 0. The 
substitution u = xft is useful at a certain stage in the work. Explain how you justify the 
answer when x <0. 

(b) Deduce from (a) the result 

[ e dx =\J—e p > 0, q § 0. 

Jo Z V p 
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(c) Prove that the integral F(x) in (a) is convergent uniformly for all values of x, and that 
the integral /<T (x/t 2 )e - ‘ 2 ~ (Jc2/t2> dt is convergent uniformly for 5 ^ x ^ M if 8 > 0. Note that 
the method of calculating F'(x) by differentiating under the integral sign gives a false result at 
x = 0. This is explained by the fact that the foregoing integral is not uniformly convergent in 
any interval including x = 0. 

3. (a) Deduce the result 

[V<r*'dt = -£h(x> o) 

Jo X 


by repeated differentiation of both sides of the formula 


dt = - 


A* - 


(b) Prove that it is legitimate to differentiate under the integral sign as was done in (a). 
4. (a) Deduce the result 

dt 1 • 3 • • (2n - 1) 1 


L 


o (F + x) 


tttt- 


2-4 - • - 2/i 


Tn + TTi (x > 0) 


by repeated differentiation of both sides of the formula 


i. 


dt _ ]L v -»/2 
0 t*Tx 2 ■ 


(b) Prove that it is legitimate to differentiate under the integral sign as was done in (a). 

5. (a) Let F„(x) = /<T t n e~ xt2 dt, x >0. Show that FA(x) = -F„+ 2 (x). Calculate F 0 (x) 
explicitly from the result (22.41-5), and then use the foregoing relation between FA and 
F„+ 2 to prove by induction that 


t 2 ' , <r JC ‘ 2 dt = V 7T 


-1-3 


( 2 n - 1 ) 1 


u r 


(b) Prove that the integral defining F„(x) converges uniformly for x ^ 5 if 5 > 0. 

6. Show that Jo e ' 2 sin 2x1 dt = e~* 2 Ji e“ 2 du. 

7. Show that 


d " , 

( 1 

1 = (~l) n/2 f t n e 1 cos xf dt, n 
Jo 

dx n 1 


d n 

( 1 

! = (_!)<»+■)«£ t n e- sin xt dt, 

dx n 

\1 + x 2 J 


8. Prove that [ — 3 — dt = ~ x(x > 0). Assume that differentiation under the in- 
Jo t 2 

tegral sign is legitimate. 

J -°° 1 — e ~ xt2 

— -2 — dt, where x ^ 0, then F'(x) can be computed by 
0 t 

differentiation under the integral sign when x >0. Deduce the value of F(x). 


10. (a) Start with 


TT? = /o 


e cos xdx (t > 0), 
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and obtain 


cos x dx = \ log 


if a and b are positive. Upon what theorem do you depend? 

(b) Assume without proof that, as a function of b, the integral in (a) is uniformly 
convergent when b ^ 0. Explain carefully how this implies that, if a > 0, 

J °°l _ e -ax 

— - — cos xdx = 2 log(l + a 2 ). 

0 X 

What theorem do you use? 

11. Prove that, if a and b are positive, 

j . c , Al _ , r L A , f“tan _1 (bt)“ tan _1 (at) J4 

and in this way find the value of the integral at. 

Jo t 


12. Show that, under certain conditions 


f(bx)-f(ax) 


=/>/:* 


Apply this to show that 


cos bx - cos ax 


dx = 7r(a - b) 


if a and b are positive. Take for granted that the “certain conditions” are satisfied. 


13. Prove that j e x — 

t r . 
* FTP I * S1 


dx = 2 log(l + y 2 ). 


sin xt dx. 


14. Prove that 


r e -*«*2zL dx ^r e -* dt ' 

Jo x Jo 

J - oc - / 

e~ y2x 2 dx = -7r^(y > 0) deduce by integration that 
o 2-y 

/•«> — a 2 x 2 _|,2 X 2 

J p dx = (b - a)Vir 


—a 2 x 2 , —b 2 x 2 

e -he 


dx = (b - a)Vrr 


if 0 < a < b. 

16. Show that 


r£l|ini d( = 

Jo t 
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17. From (22.5-13) and trigonometric identities, deduce that 




sin jc cos mx 


dx- 


\ >f |m| < 1, 
0 if |m| > 1, 


7 T 

4 


if \m\ = 1. 


f COS JtV 

18. Prove that J \+ x * d x converges uniformly with respect to y for all y. 

f 00 djc 

19. Prove that I converges uniformly with respect to y if y ^ 8, where S > 0. 

J l 

20. Prove that the integral G(a ) = f eT at S11 - t dt, where x is fixed, is convergent 

Jo t 

uniformly with respect to a when a ^ 0. 

Suggestion: Put u — dv = e~ at sin xt dt , and integrate by parts between t = T 
and t = oo, where T > 0. In this way show that 



sin xt 
t 


dt 


“? + ?T 


Explain why this inequality insures uniform convergence. (Consider the maximum of the 
expression on the right as a varies, assuming x?± 0.) 


22.51 / LAPLACE TRANSFORMS 

A function f(s ) defined by 

/(s)=f e~ st F(t) dt (22.51-1) 

Jo 

is called the Laplace transform of the function F(t). For instance, by (22.5-9) 
we see that 

/(5) = tair'g) if F(() = 5i Ji’ 
and by Exercise 5, §22.2, we see that 

f( s ) = IM if F(t) = t n ~'. 

S 

Laplace transforms are used a good deal in applied mathematics in solving 
differential equations. We do not have space in this book for an extensive account 
of the theory and applications of Laplace transforms. 

The theorems of §22.5 are useful in the discussion of Laplace transforms. 
The exponential e~ st in (22.51-1) causes the integral to converge very nicely, 
provided that F(t ) does not grow too rapidly as t -> oo. If there is some constant c 
such that |F(f)| ^ e ct for all sufficiently large values of t, the integral defining f(s) 
converges when s > c, and / has derivatives of all orders, which can be 
calculated by differentiating under the integral sign. 
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One of the most important properties of the Laplace transform is revealed 
when we integrate by parts in (22.51-1), taking u = F(t), dv = e~ st dt. We first 
obtain 



dt = --e~ s, F(t) 
s 


b t rb 

+ - e~ sl F'(t) dt. 
0 S Jo 


Then, letting b -»oo, we get 

[ e~ s, F(t) dt=- F( 0) + - f e- s, F’(t) dt, (22.51-2) 

Jo S S Jo 


provided certain conditions are satisfied. These conditions must be such as to 
guarantee the convergence of the integrals, and also such as to guarantee that 

e~ st F(t)^> 0 when t -» oo. 

It is convenient to denote the Laplace transform of a function by using a 
symbol L in this way: f(s) = L{F(t)}. Formula (22.51-2) can now be written 

L{F'(t)} = sL{F(t)}~F( 0). (22.51-3) 

If the procedure can be repeated with F r in place of F, we obtain 

L{F"(t)}=sL{F'(0}-F'(0). 

If in this result we substitute for L{F'(t)} from (22.51-3), we obtain 

L{F"(t)} = 5 2 L{F(t)}— sF(0) - F'( 0). (22.51-4) 

Formulas for the Laplace transform of higher-order derivatives can be found by 
carrying on the process. 

The foregoing formulas provide the basis for using Laplace transforms to 
solve differential equations. As a preliminary to further discussion of this 
question we observe the following formulas: 

oc 

L{sinat} = J o e _s ‘ sin atdt = -p^rp’ (22.51-5) 

L{cos at} =f e st cos at dt = tt — r (22.51-6) 

Jo 5 + u 

These may be proved by direct use of elementary integration formulas [see the 
derivation of (22.5-11)]. 

Now we consider a differential equation problem. 

Example . Find the function y = F(t) such that 

^pr + 4y = 3 sin t (22.51-7) 

and 

F(0) = 1, F'(0) = 0. 

We assume that the problem has a solution, and we let f(s) = L{F(t)}. From 



692 


IMPROPER INTEGRALS 


Ch. 22 


(22.51-7) we see that 

L{F"(t)} + 4L{F(t)} = 3L{sin f }. 

We use (22.51-4) and (22.51-5) to obtain 

s 2 f(s)-jS ■ 1-0 + 4f(s) = pq-j- 

We solve this equation to find f(s): 

3 s 

f( ' S ^ = (s T + 4)(s 2 + 1) + s 2 + 4 

The first fraction on the right can be expressed in terms of simpler fractions, and 
we find 


x/ x _ 1 1 , s 

/(S) ~5 2 T 1 s 2 + 4 + 5 2 + 4' 

We now observe by (22.51-5) and (22.51-6) that 

f(s) = Ljsin t}-^L{sin 2 t} + L{cos 2 1} 

= L{sin t — 2 sin 2t + cos 2f}. 

Since f(s) = L{F(t)}, this suggests that 

F(t) = sin t - 2 sin 2t + cos 2t. 

A check shows that this function satisfies the differential equation and the initial 
conditions. 

This example illustrates a procedure which can be developed into a sys- 
tematic technique for solving differential equations of certain types. Of course, 
the foregoing problem could also have been solved by a variety of standard 
elementary methods. 

EXERCISES 

1. Verify that L{1} = s -1 and L{e ct ] = (s - c) _I . 

2. Derive the formulas 

OL S 

L{sinhat} = -2 v L{coshai} = - 2 — ~i’ 

s — a s — a 

3. Find a function y = F(t) such that y" + 2y' = e * and F(0) = 1, F'(0) = 0. Use the 
results in Exercise 1. 

4. Find a function y = F(t) such that (a) y" + 9y = cos2t, F(0)=-1, F'(0) = 
1; (b) y"-4y = — 3 sinh t, F(0) = 0, F'(0) = 5. 

5. If L{F(t)} = f(s) y show that L{e ct F(t)} = f(s - c). Use this to find Laplace trans- 
forms of (a) t n_1 e _t ; (b) e -t sinf; (c) e 2( cos3f. 
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22.6 / REPEATED IMPROPER INTEGRALS 


We sometimes encounter improper integrals whose integrands are improper 
integrals, e.g., 

f dy \ sin jc djc, f dy f y ~ ll4 x~ ll2 e~ y{nx) dx. 

Jo Jo Jo Jo 

It is often highly useful, as a technique in solving problems, to be able to invert 
the order of integration in such integrals. The legitimacy of such an inversion of 
order is not guaranteed by Theorem IX of §22.5. We do not intend to take up 
theorems which deal with this problem. Our main intent is to point out that the 
inversion of the order of integration requires proof. Such proofs are sometimes 
rather delicate, and a systematic treatment of them would be too long and 
difficult for this book. We shall nevertheless give some examples of the use of 
inversions of the order of integration. 


Example L Assuming the correctness of 

r oc r oc r oc r oc 

dy e~ xy sin jc dx = dx \ e~ xy sin x dy, 
Jo Jo Jo Jo 


we can easily show that 


i 


sin jc , it 

~ dx = r 


( 22 . 6 - 1 ) 


( 22 . 6 - 2 ) 


for the right side of (22.6-1) is f dx, by (22.5-8). The inner integral on the 

Jo x 

left is handled in a manner similar to that of deriving (22.5-11). The result is 

1 


i 


e xy sin x dx = 


l + y ; 


y >0. 


(22.6-3) 


Thus (22.6-1) becomes 


f" dy f 00 sin x j 

Jo TT? = J 0 — dx - 


The left side here is lim y _>octan 1 y = tt/ 2. Thus (22.6-2) is established. We thus 
have an alternative to the method of Example 3, §22.5. 


Example 2 . Show that 


2 rjL.. 

J 0 Vt VnJo 1 + s 


(22.6-4) 


To get started, substitute x = s Vt in (22.41-5), taking s as the new variable 
of integration. The result is 


4=f 

V 77 J 0 


s2, ds =4= 


Vt 
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Thus 


r sin t , 2 r- r _ s2t . . 

— 7 — dt = —= dt I e sin t ds. 

h Vt VttJo Jo 


If we now assume that it is legitimate to invert the order of integration on the 
right, we obtain the repeated integral 




- s2 ‘ sin t dt. 


The inner integral here is evaluated by (22.6-3), and we have 

2 r ds 
\/nJo 1 + s 4 ’ 


(22.6-5) 


thus (22.6-4) has been derived. The integral (22.6-5) can be calculated by 


elementary methods, and is found to have the value 
we arrive at the formula 


2V2 


(see Exercise 9). Thus 


r*j±dt=(± f. 

Jn Vt \2> 


( 22 . 6 - 6 ) 


EXERCISES 

In the following exercises it is to be assumed by the student that reversal of the order of 
integration in the repeated integrals is legitimate. 

1. Let I = jo e ~ x2 dx. If x > 0, show that jo xe~ xlyl dy = I. Thus 

I 2 = f e~ xl dx f xe~ x2y2 dy. 

Jo Jo 

Now reverse the order of integration and deduce the value of I. This is an alternative 
derivation of (22.41-5). 

2. (a) Verify that 2 /<T ye -0+x2)y2 dy = 1/(1 + x 2 ). From this deduce that 


cos ax , tt _i aJ 

7T^ dx = 2 e ' 


Use the result of Exercise 1(b), §22.5. 

(b) Use the method of (a) to show that f * a ~ dx = ~ e~ a , a > 0. In the process you 

Jo 1 + X L 

will have to deal with / 0 °° xe~ x2yl sin ax dx by integration by parts, and with 
Jo° y~ 2 e~ y2 ~ (a2/4y2) dy by an appropriate substitution of a new variable of integration. 

3. By reversing the order of integration in the repeated integral 

Io dy /V 1 ' 4 * ' n e~^dx. 
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show that 


/; 


dx 

x ,l2 (\+xf 4 


ra>rri) 


4. Use the formula 1/(1 + x 2 ) = jo e xt sin t dt to show that 




dt 


if 0 < a < 2. 


5. Show that, if a ^ 0, f e ~ - 2 dx = f S *y ; dt. 

Jo 1 + X Jo fl + I 

6. Show that f a * dx = f dt, and that, if a > 0, these integrals are 

Jo i + x Jo a + t 

J mco x sin cix 

— j — — 2 dx. From this result deduce that the value of the first integral is 
o 1 + x 

ye Ha| . Assume the legitimacy of all procedures of reversing order of integration and 
differentiation under the integral sign. 


7. Show that 


f f a \ e * vT dt = if a>0. Use this and the result 

Jo x 


f e xVF sin x dx = yy— - to show that f y— dt = 2T(2a) f dx if 0 < a < 1. 

Jo 1 + 1 Jo 1 + 1 Jo x 

8. Use the formula f t a ~'e~ xt dt = ^p(x > 0, a > 0) to show that f cos / dx = 
Jo x Jo x 

ffe/„ IT?* if0<a<1 ' 

f“ dx _ ttV2 

Jo 1+x 4 4 


9. Prove that 


by expressing the integrand in terms of partial 


fractions with quadratic denominators and deriving the integration formula 
dx V2. x 2 + xV2+ 1 . V2 


f dx _V 2 
J 1 + x 4 8 


log 


— xV2 + 1 4 


H [tan '(x V2 - 1) + tan ! (xV2+ 1)]. 


10. (a) Prove that f y ^ x 4 = f 4 by substituting y = — in the integral. 
Jo I + X Jo 1 + X X 

(b) Prove that f cos _ t dt = (—') by the method of illustrative Example 2. 

J 0 Vt V2/ 


22.7 / THE BETA FUNCTION 

The beta function is defined by 

B(x, y) = f ' t x ~‘( 1 - t) , “ 1 dt. (22.7-1) 

Jo 

If x > 1 and y^l, the integral is proper. If x > 0 and y > 0, and either x < 1 or 
y < 1 (or both), the integral is improper, but convergent. 

We shall show that there is an important connection between the beta 
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function and the gamma function. In fact, if x and y are positive. 

Dfjc r(x>r(y) 

m ’ y) ~ r(x + y) 


(22.7-2) 


We shall verify this relation without trying to explain how it might be dis- 
covered. 

To begin with, let us substitute 

du 


t = 


1 + M 


dt = 


(1 + u ) 2 


y-1 


du 


(1 + u ) 2 


(22.7-3) 


in (22.7-1), thus getting 

Next, in the formula (22.2-1) let us substitute t = vu, dt = v du , (v > 0). Then 

r(*) = f (vu) x ~ l e~ vu v du = f v x u x ~ l e~ vu du. 

Jo Jo 

Multiply this by and integrate with respect to v: 

T(x) f v y ~ 1 e~ v dv=f dv f v x+y - x u x - l e~ vil+u) du. 

Jo Jo Jo 

The left side here is r(x)T(y). We assume that we can reverse the order of 
integration on the right. Then we have 

r(jc)T(y) = f u x ~ l du f v x+y ~ l e~ v(1+u) dv. 

Jo Jo 

In the inner integral set 

dw 


v = 


w 

1 + u 


> dv = 


1 + u 


Then 


[ v x+y ' 1 
Jo 


-\ e -v«+u) dv = 


i r 

U) X+y Jo 


( 1 + 

T(x + y) 
(1 + u) x+y ’ 


w 


x+y-i„-w 


e' w dw 


and so 


r(x)r(y) = r(x + y) £ (l f u ~ du = r(x + y)B(x, y), 

by (22.7-3). Thus (22.7-2) is established. The justification of the reversal of order 
of integration is a problem of the type referred to in §22.6. There are various 
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methods for proving (22.7-2). Some of the methods use the theory of improper 
double integrals, as sketched in §22.41. 

A useful alternative form for the beta function comes from putting 

t = sin 2 0, dt = 2 sin 0 cos 0 dO 

in (22.7-1). The new formula is 

fnl2 

B(x, y) = 2 sin 21 ' 1 6 cos 2 * -1 6 dd. (: 12.7-4 ) 

Jo 

In particular, with x = n, y = we have 

sin 2 "" 1 0 dO = 5BO1, D = | • • (22.7-5) 

By repeated use of (22.2-5) we can write 

T(n + -1) = (n - J)r(n - D = (n - b(n - J)Hb - i) 

= (n - {)(n -!)■•■ ^( 5 ), 


or 


r(n + -5) = 1 3 5 ' 2 ; (2w 1} rd). 

When this is combined with (22.7-5) we have 


L 


tt/2 


sin 


2n-l 


2"-'r (n) _ 2 • 4 • • • (2n — 2) 

1 ■ 3 • 5 • • • (2n - 1) 1 • 3 ••• (2n - 1)' 


ede 


In a similar manner it may be shown that 


l 


tt/2 1 ■ T 

sin 2 " 6 dO = 


(2n — 1) 7 r 

2 • 4 • • • (2 n) ‘ 2 


(22.7-6) 


(22.7-7) 


(22.7-8) 


These two formulas are valid for n = 1, 2, ... . They may be derived in a 
completely elementary fashion by using the reduction formula 


i 


■n/2 


sin m 0 dO = 


m 


m 


H 


tt/2 


sin m “ 2 6 dO. 


Our purpose in deriving (22.7-7) and (22.7-8) here is so that we may prove the 
formula 


7T 

2 


r 2 • 4 • • • (2n) l 2 1 
Jf-wcLl * 3 • • • (2n — 1)J 2n + l 


(22.7-9) 


This formula is called Wallis’s formula, after a 17th century English mathemati- 
cian. To prove (22.7-9) we observe that, if 0<0 

sin 2n+I 6 < sin 2 " 0 < sin 2 " -1 0, 
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and so 

r tt/2 r tt/2 r tt/2 

sin 2 ” +1 6 dO < sin 2 ” 0 d6 < sin 2 ”" 1 0 dO. 

Jo Jo Jo 

Therefore 

2 • 4 • ■ • (2 n) _ 1 • 3 • (2n - 1) 77 ^ 2 ■ 4 • • • (2n - 2) 

1 • 3 ■ • • (2n + 1 ) 2 • 4 • • • (2n) ' 2 1 • 3 ■ • (2n - 1 )' 

These inequalities can be rewritten in the form 

In 77 C 2 * 4 ■ ♦ • (2n) l 2 1 77 

2n + 1* 2 Ll ■ 3 • • • (2n - 1)J 2n + 1 2 ’ 

and (22.7-9) follows at once. 


EXERCISES 

1. Show directly from (22.7-1) that B(y, x) = B(x, y) (make a simple change of the 
variable of integration). 

2 . Show that B(x, 1) = x~ l . 

3. Using integration by parts, show directly from (22.7-1) that yB(x + l,y) = 
xB(x, y + 1). Hence, if m + 1 and n + 1 are positive integers, show that B(m + 1, n + 1) = 

m 1 n t 

7 ; — 7 ; r , without recourse to (22.7-2). 

(m + n + 1 )! 

4. Show directly from (22.7-1) that B(x + 1, y) + B(x, y + 1) = B(x, y). Combine this 
with the result in Exercise 3 to show that 
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(c) the total area enclosed by the curve x 2 ' 3 + y 2,3 = 1, (d) f ; . dx . 

J 0 V 1 — A 

8. Show that B(x, x) = 2 Jo l2 (t - f 2 )* -1 dt. Make the change of variable u =_4 (t - f 2 ), 
and deduce the relation B(x, x) = 2 1-2x B(x, 1). From this deduce that V7]T(2 x) = 
2 2j£_, r(x)r(jc+-2). Compare with Exercise 7, §22.2. 

9. The gamma function can be expressed in the form 

T(jc) = 2 s 2x -'e~ s2 ds, 

as we see putting t = s 2 in (22.2-1). Show that, as a consequence, 


Jr(x)r(y) = JI dA, 

R 


where R is the entire first quadrant of the sf -plane. Using a method similar to that of 
Example 2, §22.41, evaluate this improper double integral with the aid of polar co- 
ordinates, and show that it has the value *B(jc, y)T(x + y). This gives us a new derivation 
of the formula (22.7-2). 


10. Show that l ( T jdy = l f 
r 1 x a ~ l + x ~ a 

x* 1 - a)m L* L T+r- dx - 


dx if 0<a<l, and hence (see (22.7-3)) that 


11. Use the geometric series for (1 +jc) 1 in the last integral in Exercise 10, and 
integrate the resulting series term by term. Show that the result can be put in the form 

1 “ j 

r(a)T(l - a) = — + 2a 2 C — l)"^ 1 — 2 r It can be shown that the series on the right has 

a n = 1 n — a 

the value 7r/(sin fl7r). Thus 


B(a, 1 — 


a) = /o f 


dt = 


7T_ 

sin air 


if 0 < a < 


12. If 0<a<l, show, using (22.7-3), that B(a, 1 - a) = — \ \ \iz m Hence show 

a Jo 1 + x 

f X dx 7 7 77 

that, if n > 1, I — esc — Use the result stated in Exercise 11. 

Jo 1 + X n n 


22.8 / STIRLING S FORMULA 

The factorial function features in many problems in mathematics and related 
fields, especially statistics and physics. A significant fact about ti ! is that it grows 
rapidly as n increases, and it is very useful in practice to have a convenient 
approximate formula for n! when n is large. We shall show that for every 
positive integer, n, there is a number 0 (depending on n) such that 0 < 6 < 1, and 

n ! = n n e- n V 2we e,12n . (22.8-1) 

This is known as Stirling's formula. Its usefulness depends on the fact that e m2n 
is close to 1 when n is large. Accordingly, the percentage error in using 
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n n e "V27 rn as an approximation for n! approaches zero as We shall 

elaborate on this point later. Stirling did his work on approximating n! about 
1730. 

The ingenious derivation of Sterling’s formula which we shall give depends 
on a clever method of obtaining a formula for the logarithm of n ! We start from 
the fact that 


log n ! = 2 log P- (22.8-2) 

p=2 

Now consider the graph of y = log x from x = p - 1 to x = p. In Fig. 184 we have 
exaggerated the curvature of the graph to make the region between the graph 
and the dotted chord easier to see. When p is large the graph of the logarithm 
from p - 1 to p is not very different from the chord and the area between the 
two, which we shall denote by e p , is quite small. 



In the figure, the area of the rectangle is the same as the height, namely 
log p. We can think of this rectangular area as being made up of the trapezoidal 
area under the chord plus the triangular area above the chord. These parts are, 
respectively, 

f log xdx-e p and ^[log p - log(p - 1)]. 

Jp-i 

Therefore 

f p 

log P = log xdx+ £log p - log(p - 1)] - €p. (22.8-3) 

Jp-i 

Forming the sum from p = 2 to p = n, we find, using (22.8-2), that 

rn n 

log n ! = log X dx + \ log n - 2, V 

J 1 P=2 

By carrying out the integration we get 

n 

log n ! = n log n - n + 1 4- \ log n - 2 € p- (22.8-4) 

p=2 

Our next problem is to deal with the sum of the ep’s. By carrying out the 
integration in (22.8-3), solving for e p , and simplifying, we obtain e p = 
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(p “ 2) log p - (p - 3) log(p - 1) - 1, which can be written 


2p - 1 , p , 
ep = - T - I °g— j-L 


(22.8-5) 


Our purpose now is to show that the infinite series Ip= 2 e p is convergent and to 
obtain an estimate for the remainder of this series after the term e„. The step we 
now take depends on a clever use of the infinite series 

log j“_ * = 2 (t + 3 1 3 + It 5 + • • •), 

which, by (19.6-1) we know to be valid if |t|<l. We put t = clearly 

0 < t = 3 if p^ 2, and it is easily shown that = Substituting in the 

foregoing series gives 

log = + 3(2p - \f + 5(2p - l) 5 + " '] ' 


It now follows from (22.8-5) that we have, for p ^ 2, 


e n - 


1 


;p 3(2 p - l) 2 5(2 p - l) 4 7(2p - l) 6 

We compare this expression for e p with the quantity S p defined as follows: 


( 22 . 8 - 6 ) 


J . 1 , 1 , 

75 ' 771 1 7777. 77K ■ 


p 3(2p - l) 2 3(2 p - l) 4 3(2p - l) 6 
This is a geometric series whose sum we can calculate, because 


(22.8-7) 


r 2 + r 4 + r 6 + 


1 — r" 


when |r| < 1. 


Letting r 2 = - f p * n (22.8-7) we see that 




2T‘ 


' p 3(2p — l) 2 1 — [l/(2p — l) z ] 
Simplifying this, we find that 

* 1 
p 12 p(p - l) 

n 

In order to find a simple expression for 2 S p we observe that 

2 

1 _ 1 I, 

p(p-l) P-1 P 


(22.8-8) 
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and that 




The cancellation of terms makes finding the sum easy. We have here what is called a 
telescoping sum. We see from the foregoing that 


and that 


2 8„ 
P“2 


n 1 

= lim 2 8p = TX’ 

n-*o c p= 2 


P 


2 8 P 


=n+1 


2 2 ^P 

P~ 2 p=2 


1 

12n 


But then, because it is clear from (22.8-6) and (22.8-7) that 0 < e p < 8 P , we 
conclude that 


V = 9 
p £ +l £p 12 n 


(22.8-9) 


where 0 is the ratio of 2 c p to 2 V The important thing for us is that 

p=n + 1 p=n+I 

O<0<1. The value of 6 depends on n, of course, and at times it is desirable to 
emphasize the dependency by a subscript: 0 = 9 n . 

Now let 


a = 2 e P , so that 2 «p = « -Tjr- (22.8-10) 

p=2 p= 2 izn 

We substitute this last result in (22.8-4) and go to exponentials on both sides of 
the equation, thus obtaining 

n ! = n n e~ n Vne l ~ a e d ^ 2 \ (22.8-11) 

It remains to find the value of the constant term e I_a , which we shall denote 
by C. We can achieve our purpose by using Wallis’s formula (22.7-9). We 
observe that 2 • 4 • ■ • • 2n = 2 n n ! and that 


1*3-5- 


■ * (2n — 1) = 


(Ml 

2*4 (2n) 


(Ml 

2 n ti ! 5 


so that Wallis’s formula can be written 


2 4n (n !) 4 

n™ [(2n)!] 2 (2n + 1) 2 


(22.8-12) 


From (22.8-11), with e'~ a = C, 

(n !) 4 = C 4 n 4n e~ 4n n 2 e 6nl3n , and 
(2n)! = C2 2n n 2n e~ ln V2n e 9 ** ,2An . 

Substituting these into (22.8-12) gives a rather formidable-looking expression 
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which can, after much cancellation, be reduced to 


C 2 ne 6nl3n 

™ 2(2 n + l)e d2nlUn 


7 r 
2 


But the limit on the left side is, clearly, C 2 /4. Therefore C 2 /4 = tt/ 2, or C = V27 t. 
Substituting this value for C into our above formula for (n!)\ and taking the 
fourth root of both sides, we get (22.8-1). In our derivation of Stirling’s formula 
we have made use of the hypothesis that n ^ 2, but it is a simple matter to verify 
(22.8-1) directly in the special case where n = 1. 

It follows from Stirling’s formula that 


lim 

n-w» 


w! 

n"e~"V27rn 


= 1. 


(22.8-13) 


The fact that the above ratio approaches 1 as rt q° is expressed in words by 
saying that n \ is asymptotically equal to ti n e~ n V 27rn. It is important to notice 
that even though the ratio of n ! to n n e~ n \/2rrn approaches 1 as n approa ches 
infinity, this does not imply that the difference between n! and n n e n V2nn 
approaches zero — in fact, it approaches infinity. So asymptotic approximations 
can be somewhat different from the kinds of approximations we have been used 
to. 

We can easily obtain some notion of the behavior of the fraction on the left 
in (22.8-13), as a function of n, by observing from (22.8-1) that 


n • _ e ei\2n 

n rt e~ n \ / 27m 


(22.8-14) 


and estimating the value of e 6l]2n . Clearly it is larger than 1, because e x is an 
increasing function which equals 1 when x = 0. Using the expansion of e x in 
powers of x we see that 


e lll2n 



<1 + 



The last series here is a geometric series whose sum is 


1 

1 — (1/ 12n ) 


12n 

12n - 1 


1 + 


1 

12n - 1 


Thus we see that 


1 < < 1 + — - — ’ (22.8-15) 

nV"V2irn 12n - 1 

and this shows that the ratio in question is rapidly squeezed downward closer and 
closer to 1 as n increases. The fourth column of the following table (which can 
be easily checked with the aid of a pocket calculator) shows how close the ratio 
is to 1, even for small values of n. In view of this, the fifth column is rather 
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surprising in showing a startling rate of increase in the difference between the 
numerator and the denominator. 


n 

n\ 

n n e "V27 m 

n! 

n ! - n n e "V27 rn 




n"e~"y/2tin 


5 

120 

118 

1.01695 

2 

10 

3,628,800 

3,598,696 

1.00837 

30,104 

15 

1,307,674,368,000 

1,300,430,725,000 

j 

1.00557 

7,243,642,000 


It is an interesting and not difficult exercise to improve the first (left-hand 
side) of the two inequalities in (22.8-15) by showing that 

1 n i 

1+ < (22.8-16) 

12 n + 1 n"e'"V2irn 

We use a technique analogous to that used in going from (22.8-6) to (22.8-7) (a 
geometric series) and then estimating sums of the 8 p ’s by use of telescoping 
sums. This time we want something smaller than e p . We let 

r - 1 , | 1 | 1 | ■ 

” 3(2p - l) 2 3 2 (2p - l) 4 3 3 (2p - l) 6 

It is clear that e„ > r p because 

U 1 Jkl 1 - 1 

5 > ?’ 7 3 ! 2n+l 3*' 

as may be proved easily by mathematical induction. Also, by the formula for the 
sum of a geometrical series, 

1 1 = 1 

r ” 3(2p — l) 2 1 3(2p-l) 2 -l‘ 

3(2p - l) 2 

Next, we want to get a lower bound for the value of 2 p = n +i r p by use of a 
telescoping series. Things are not as simple here as they were with the S p ’s. We 
write 

rp = 12 (p 2 -p+!) 

and observe that, to get a telescoping series, we want an inequality of the form 

1 1/ 1 ?\ 

r ”~ 12(p - 1 + a)(p + a) 12 Vp — 1 + a p + a) 

for some positive number a, so that when we sum the expressions on the right 
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we will get a telescoping series. What we need then, is to have 

p 2 - p + 6^(p — 1 + a)(p + a) = p 2 - p + ap + ap - a + a 2 , 
or 

6 = lap - a + a 2 = a(2p - 1 + a) when p ^ 2. 

This inequality will hold when p> 2 if it holds when p = 2, so we want 
£<a(a + 3). Various choices are possible for a; a=ig and a = n are both 
acceptable. For simplicity of arithmetic we shall choose a = u- We then have 


2 e p > 2 r p >TT X 


p=n+l p=n+l 


Since, by (22.8-9), 


we get 


12 nf^ +1 \P “ 1 + 12 P + 


-g-> * - 

12n I2n + 1 


> e n m+i 


12n+ 1 


But now, because e* = 1 + x + — + • • • > 1 + x when x > 0, we see that 

e m2n > 1 + 1 

12n + 1 

In view of (22.8-14) this gives us (22.8-16). The reasoning which we used made 
use of the assumption that n ^ 2, but it can be easily verified that both (22.8-15) 
and (22.8-16) continue to hold when n = 1. In summary, we have shown that for 
every positive integer n, 

i+ — - — < — ~ =< i + — - — 

12n + 1 nV’V2iffl 12 n - 1 

MISCELLANEOUS EXERCISES 
1. Prove the following: 

, v r x J _ i r dx _ i 

(a) Jo (l + x) 3<,X 2]o (1 + x) 2 2' 

... f" Vx , 1 , 7T 

(b) J, oTlo 2dx = 2 + T 

, . f“ dx 2- V2, ^ A . 

( £ ) — s = - — 4 ~ (a >0). 

xVV + x 3a 4 


(a+ x)(b+ x 2 ) 2ab(a + b) 
x 2 dx 7T- 

(a 2 "+x i )(b 2 + x 2 ) 2(a + b) 

x 2 dx _ tt /L ^ m 

(x^PfTFP-2b (b>0) - 

X 4 dx TT ^ 


[(x-a)+ rx] 4b 


(b >0). 
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2. Examine the following integrals as to convergence or divergence: 

(a) J log[ l±- ( - 1 »1 dx (b) | log(1 + e -) fa 

3. Suppose J(x)^0, a„ = J x " +1 J(x) dx, where c = x 0 <Xi <x 2 < • • * , and x„^oo. 
Prove that J“ J(x) dx is convergent if and only if 2 a n is convergent. 

4. Let F(x) - jo 12 log(l - x 2 cos 2 d) dd, 0 ^ x 2 ^ 1. 

(a) Find F'(x ) when 0 = x < 1, by differentiation under the integral sign (this is justified by 
Theorem XIV, §18.5); evaluate the re suiting integral by use of standard tables. Then 

integrate and find F(x) = tt log ^ , at least if 0 ^ x 2 < 1. 

(b) Prove that F is continuous at x = 1, by use of Theorem VII, and hence deduce from 
(a) that So 12 log sin 6 do = (ttI 2) log r 

5. Let I - Jo" log sin 6 dd. Show that 

r tr/2 r tr/2 

I = 2 1 log sin d dd = 2 I log cos 0 dd. 


Then use the formula sin d = 2 sin(0/2) cos(0/2) to deduce that I = - tt log 2. 

2 i 2 

xe~ ax cos fix dx = 7 t 7 (a > 0). 

o (fl + 0 ) 

7. In the integral F(x) = J<T e _,2_( * 2/t2) df, assume that x >0, and make the change of 
variable u = t-(xft ). The resulting integral will be of the form f~ x <f> (x, u) du. Express 
J-oc 4>(x , u) du in the form fo <p(x, -u) du and in this way deduce the value of F(x). 

8. Suppose fo J(x) dx is convergent. Suppose also that <^(x)>0, that </>'(x) is 
continuous, and 4>'(x)^0. Show that fa 4>(x)f(x) dx is convergent. 

f 00 cos x 2 

9. Use the result of Exercise 8 to show that J tan ~i ~ dx is convergent. 

10. Let F(x) = — f ~4 ~~~72 ‘ Find the value of F(x) for each x. Is F continuous at 

7T Jo X + t 

x = 0? 

11. Let F(y) = J<T y 3 e _xy2 dx. Show that F(y)=y for all values of y. Verify that 
F'(y) = J |\^(y 3 e _xy2 ) J dx ifyi 6 0, but that this is false if y = 0. What do you conclude from 

this about the uniform convergence of the integral J<T y 2 (3 - 2y 2 x)e" xy2 dx? 

12. Is the equation 


r dy f (2y - 2xy 3 )e _xy2 dx = f dx f (2y - 2xy 3 )<T xy2 dy 
Jo Jo Jo Jo 


true or false when a? 0? 

13. Let f(x, y) = sin(x 2 + y 2 ). Let R be the square region O^x^a, O^y^a, and let 
T be the portion of the circular region x 2 + y 2 ^ a 2 which lies in the first quadrant. Show 
that, if I = ff f(x,y) dA and J = ff f(x,y)dA, then I -> tt/4 as a^oo, but J does not 

R T 

approach any limit. 

14. (a) Let F(x) = J df. Show that F"(x) - F(x) = - 7t/2 if x > 0. Take for 

granted that F"(x) can be calculated by differentiating under the integral. Solve the 



22.8 


STIRLING’S FORMULA 


707 


differential equation and evaluate F(0), F'(0), and thus show that, if x ^0, 

sinxt 7T 

Jo 7(TTFj dt -2 (1 - e >■ 


f 00 COS 

Jo U 


COS Xf 7T _ x 

TT7Tdt=^-e . 


(b) Show how to deduce the value of 


f x t sin x 

Jo 1 + r 


dt from the results in (a). State what 


must be proved .to be legitimate for the success of your method, but do not prove it. 

15. Suppose /(x ) is continuous when x ^ 0, that lim exists, and that 


H-M du is 

Jl u 


5 convergent. From these assumptions it may be proved that 


r fjbxyjiax) dx = /(0) |og a 

Jo X V 

if a and b are positive. Write out a proof with the aid of the following suggestions: 
(a) Show that, if y > 0, 


f(ax)-f(0) 


f ay f(u)~ 
Jo u 


explaining why the integrals are not improper at the lower limit, 
(b) Explain why 

Urn r^du = 0. 

y— *‘ oc J ay W 


(c) Complete the proof. 

16. Show that the functions cos x and (tt/ 2) - tan' 1 x satisfy the conditions on f(x ) in 
Exercise 15, and so find the values of the integrals 

cos bx - cos ax , f* tan' 1 bx - tan -1 ax . 


17. Suppose that f(x) is continuous when x ^ 0, and that /'(x) is continuous when 
x>0. Suppose also that lim*^oc/(x) exists, and denote the limit by /( oo). From these 
assumptions it may be proved that, if a and b are positive, 

rfjbxtj(ax) dx = (00) _ /(0)) | 0g(b/a) 

Jo x 


Write out a proof with the aid of the following suggestions: 

(a) Apply Theorem IX to Ja dt Jo f'(tx) dx. 

(b) Show that JT/'(tx) dx is convergent uniformly for t ^ a, and the same for Jo 1 f'(tx ) dx 
in case this integral is improper at x = 0. 

18. Apply the result of Exercise 17 to each of the functions e x , tan -1 x. 

19. Show that, if 0 < a < 1, J ^ gt dt = r(a)r(l — a). 

20. (a) Let F(d) = Jo e~ tcos 6 t a ~ l cos (t sin 6) dt , 

G(0) = Jo e l cos 6 t a_1 sin(t sin 0) dt. 
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IMPROPER INTEGRALS 


Ch. 22 


where — tt/2 < 6 < tt/ 2 and a > 0. Show that F'(6) = — aG(6 ) and G'(6) = aF(6), whence 
F"(0)+ a 2 F(0) = 0. From this deduce that F(0) = T(a) cos aO and G(0) = T(a) sin aO. 

(b) Show how to obtain the formulas 


[“cos t 

Jo t X ~ a 


'^rdt = r(a)cos 


rra 

2 ’ L 


SW t u rv \ • /JTa 

7F adt = r(n) sin — 


from the results in (a). State without proof the facts about uniform convergence which 
would be sufficient to justify the procedure. 

(c) The first formula in (b) is valid if 0 < a < 1. The second is actually valid if -1 < a < 1, 

a ^ 0, T(a) for a <0 being defined as in §22.2 (see (22.2-10)). Deduce that f dt = 

Jo t 

T(1 - a) cos ■— • if 0 < a < 2, a ^ 1. 


21. Some improper integrals can be evaluated by expanding the integrand in a series 
and integrating term by term. For improper integrals the justification of this procedure is 
not covered by Theorem IV, §20.4. In the following, assume the legitimacy of the- 
procedure, and deduce the results as listed. 


(a) L ^ 


(b) L i log T^ dx = 2 (p + ? + ? + ^ + "')- 
(d) L lo e^T djc = 2 (p + ? + ? + ^ + '")- 

w fA* aj (p + ? + ? + ? + ") 


22. Deduce the formula /<T e 0x2 cos 2bx dx = iVn/a e b2/a (assuming (a>0) by 
expressing the cosine as a power series arid integrating term by term. 

23. In picking a value for the number a such that 


rp ~ 12 (p - 1 + a)(p + a) 

for all p ^ 2 (see §22.8), it was observed that many choices are possible. For simplicity we 
took a- 12 . Considering only positive values of a, what choice makes the upper and lower 
bounds for 


n n e~ ny J2im 


as close together as possible? 

When n = 10, what percentage reduction in the difference between these two bounds 
is achieved by using the best positive value of a rather than ii? 



Answers to Selected Exercises 


CHAPTER 1 

§1.1 Pages 9-12 

1. (a) 8; (c) n; (e)-J; (h) 2. 

2. (a) 0; (b) 1; (d) 

3. (a) Yes; (c) Yes, lim /(x) = 0; no, / not continuous at x = 0. (d) yes; (e) no. 

x-M) 

4. (a) 2; (b) 1; (c) 12. 9. (a) 0; (b) 1; (c) -1. 

10. (a) Approaches 0; (b) becomes infinitely large and positive; (c) does not exist. 

13. (a) and (c). 14. (c) 0. 

15. (b) f(x) = -1 — x if 0<x < 1; /(x) = 1 + x if 1 < x. 

16. (a) /(-l) = 2; (c) the limit does not exist. 

17. (a) /(§) = 1; (b) yes; (d) no. 

18. (a) 0, 1, -1, 0, -1; (b) no. 

19. Take 8 to be the smaller of the numbers 1, e/5. 

21. Take 8 to be the smaller of the numbers 1, ell . 

24. No; lim /(x)=l, lim f(x) = 3. 

*-> 0 + X 1 — .0 

§1.11 Pages 18-20 

9. 20077. 10. x = ct\ 11. 0. 14. (c) No. 15. (c) 0. 

18. (a) n ^ 2; (b) n ^ 4; (c) n ^ 5. 

21. / is continuous but not differentiable at x = 1, x = 2. f'(b = 1/V2. 

24. In first case yes, with /'(0) = 0; in second case / is not differentiable at x = 0. 

§1.12 Pages 24-26 

1. (b) Max. 0 at x = -5, min. -226 at x = 06. 

2. Max. 7000, min._200(35-27V5). 3. 10. 5. g. 

6. Abs. min. 48 ^4; no abs. max. 7. Abs. min. 125. 

8. (a) 2, -2; (b) I, -3. 9. ^ . 11. (c) 2 r < h. 

12. (b) Max. 4, min. tt. Hint: Use symmetry about line x =i, and study the graph of 
F(x), where f(x) = 

13. (d) 25. 14. (c) Ci > c 2 and c 2 a < hVc?- cl. 15. Max. 

16. (a) Quickest to walk all way around from A to B. 

(b) It is never quickest to mix rowing and walking. Quickest to row directly across if 

7T 73" 

vlu <y, and to walk all the way if vlu >y. 



§1.2 Pages 30-32 

3. 2-s V21. 4. CS1. 
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ANSWERS TO SELECTED EXERCISES 


§1.3 Pages 34-35 

1. (a) dy = -2; (b) dx = 2. 5. tty = p = da 


§1.4 Page 38 

3. If F(x) = 0 when O^x^l, F(x)~x- 1 when 1 ^ x ^ 2, and F(x) = 2x-3 when 
2^x^3, then F'(x) = [x] except at x = 1, 2, 3, but F'(x) is not defined at these 
points. 

§1.5 Pages 43-44 

1- (a) 10, -2; (b) 7, 1; (c) 4. 

2. (a) 1.09458; (b) s = 0.99564, S = 1.21786. 

6. (b) A,(n) = V(n + l) 2 . 9. Low 1.3339, high 1.4052. 

§1.51 Page 46 
2. .1 77a. 3. jirR 2 . 


§1.52 Page 49 

2. (a) 1/V3. 3. (b) 2/tt; (c) 0. 4. (a) 1; (b) 0. 

5. (a) x = 0, 1; (b) 0. 6.0. 

§1.53 Pages 51-53 

3. (a) log 4; (b) log 2; (c) log (d) log j; (e) log 10. 

5. (a) ir/3; (b) hit; (c) 5tt/12V3; (d) tt/ 2. 8. (d) (tt/ 2)- jtan“' 2. 

15. sir(b - a) 2 . 18. -n- 2 /4. 

§1.61 Pages 57-58 

6. Take M the larger of the numbers s, (7 + e)/( 3e). 

11. (a) /’(0) = +=; (b) / 1(0) = +«, /'CO) = -oo. 

§1.62 Pages 65-66 

1. (b), (e), and (f) convergent. 2. N > 1/(3 e) will do. 3. A = 6. 

4. Take N greater than 10 and 10 n /(10!e). 7. (b) 2 r . 8. 5. 

9. (a) 0; (b) l 11. /(*) = 0 if x * 1, /(l) = l 14. (a) i 20. (b) e 2 . 

Miscellaneous Exercises Pages 70-71 

3. l(a + b). 5. 2. 7. (b) (1 - cos 7ra)(7ra). 9. a + b. 

10. Continuous, but not differentiable. 

12. F'(0) = 2. 13. Max. 8, min. -4. 


CHAPTER 2 

§2.7 Page 82 

5. The limit is 2. 


6. The limit is 1(1 + Vl + 4c). 
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Miscellaneous Exercises Pages 83-84 

4. N = 3. 6. The Lu.b. is 2. 

7. The l.u.b. is f; the g.l.b. is -1. 8. (b) 1. 


CHAPTER 3 

§3 Page 86 
1. No. 

3. The answer is “yes” in both cases. In the second case, consider f(x) = -x if x <0, 
f(x) =1 if x ^ 0, and g(x) = 1 if x < 0, g(jc) = x if x ^ 0. 

4. The answers are “no” and “yes”, respectively. 

5. No. 

§3.1 Page 88 

1. Yes. 2. Yes. 3. (b), (c), (d). 4. (a) 14; (b) 6; (c) 13. 

§3.2 Page 90 

2. M = 1, m = 0; M not attained. 

3. M - 7t/2, m = -ttI 2; neither attained. 

4. M = 1, m = 0; m not attained. 

§3.3 Pages 91-92 

5. Consider the function F(x) defined by F(x)^J f(t) dt'j = J f(t) dt. 


Miscellaneous Exercises Pages 92-94 

1. (ii). (a) g undefined at 1, — 1; (b) continuous elsewhere; (c) may be defined at 1 by 

g(l) = i, and is then continuous there. 

(iv). (a) F undefined at 4, 9, and for x <0; (b) continuous elsewhere; (c) con- 

tinuous at 4 if one defines F(4) = -32. 

(vi). (a) H undefined at 2, and for x <f; (b) continuous elsewhere; (c) no. 

3. Let x = y = 0 and use /( 0 + 0) = /( 0) + /( 0) to conclude that /( 0) = 0. Observe that 
lim f(x) - f(x 0 ) is equivalent to lim f(x 0 + h) = f(x 0 ). 

X-*XQ h-» O 

Explain why lim f(h) = 0. Now use f(x 0 + h) = f(x 0 ) + f(h ) to complete the proof. 

h— *0 

7. There must be at least one positive root and at least one negative root. 

11. (a) Define L as the set of all real numbers x such that f(u) <0 if u <x. Define R as 
the set of all real numbers not in L; that is, x is in R if and only if there is some u 
with u < x and f(u ) ^ 0. It needs to be verified that L and R define a cut and that, 
because of the continuity of / at the cut number c, /(c) can be neither positive nor 
negative. 

16. (a) Only at x = i 17. (c) Yes. 

19. Hint: Choose a point u such that a <u<b and f(u ) = (a + b)/ 2. Why is this 
possible? Then consider the intervals [a, u ] and [m, b]. 
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CHAPTER 4 


54.3 Pages 103-105 

1. x 4 = 8 1 + 108(x - 3) + 54(x - 3) 2 + 12(x - 3) 3 + (x - 3) 4 . 
3. (a) sin 2 x = x 2 - 5 x 4 cos 2X. 

log(l-x) log(l - X)- 3 1 

W 1-x L 2(1 -X) 3 J- 


cos X 


\X( 1 rV 


9. x = 


■ n 

2(1 + A)' 


§4.5 Pages 112-114 

1. (a) + 00; (b) 5; (e) e 2 ; (1)1. 

2. (b) 0; (e) 2; (f) 0; (g) +°°; (h) 1 . 

3. (b) 1; (d) e 2 ; (1) 1. 

4. (a) 0; (c) + °°; (e) §. 

7. (b) 0; (c) 0; (d) §. 


Miscellaneous Exercises Pages 114-115 

6. (a) 0; (b) e~ l . 

7. The limit is a n in the general case. 8. 1. 11 . 6-* 1 l(n + 2). 


CHAPTER 5 

§5.1 Pages 121-122 

1. S is open. B(S) is the set described in Exercise 2. 

2. S is closed. It has no interior points. 

3. S is closed. B(S) is composed of the line segment y = l,— 1 ^ x ^ 1, and the parabolic 
arc y - x 2 , -1 ^x ^ 1. 

4. S is neither open nor closed. B(S) is composed of the part of the curve xy = 1 in the 
first quadrant, and the nonnegative portion of each co-ordinate axis. 

5. S has no interior points. It is not closed. B(S) consists of S and the line segment 
x=0, -l^y^l. 

6. S is not open. B(S) consists of the semicircular arc y = V4 — x 2 , the segment y =0, 
-2^x^2, the segment x = 1/n, 0 < y ^ 1, and the segment x = 0, 0 < y ^ 1. The 
points of this last segment are in S. S is a region, but C(S) is not. 

7. This set is open, and therefore a region. Its boundary consists of the y-axis and the 
curve y = sin(l/x). 

8. This set is open. Its boundary consists of the half-lines x = nrr, y ^ 0, and the 

segments y = 0, 2 mr ^xS (2 n + \)tt, n = 0, ±1, ±2, 


§5.2 Pages 124-125 

2. No. 8. 5 = 2Ve. 10. 6 = Vi/2 will do. 12. Yes. 
§5.3 Page 127 

2. No. 4. (a) Yes; (b) no. 5. No discontinuities. 

7. Define /(x, x) = 0. 8. (a) No; (b) continuous elsewhere. 

9. No. 10. Yes. Define /( 0, 0, 0) = 0. 
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CHAPTER 6 

§6.1 Page 134 

„ du_yu + 4xv dv _ 4xu - yv 
dx - 2(u 2 + d 2 )’ dx ~ 2(u 2 + t> 2 )' 

- dz , dz dz y z-x 2 

3 -aI =2 ’^ = " 1 ' 4 -aT = F^' 

§6.2 Page 138 

1. (a)x + z = l; (c)y = z. 2. 5x + 7y-21z + 9 = 0. 
4. The angle is cos~'(l/V3). 

§6.3 Pages 143-144 

1. (3,4,5). 3. 3 , at (1, s). 

4. Min. at (6,6); no max. 5. 4 A\a 2 + b 2 + c 2 ) '. 

6. Max. I; min. —1. 7. Max. j; min. —1. 10. |V3. 

11. Max. 2e _1 at (0,±1). 

12. Max. 16 at (0, 2) and (2, 0); min. 0 at (0, 0). 

13. Max. 45 at (3, 3); min. -30 at (2, 1). 

14. 18. 15. jS4. 

18. (a) Along x-axis and at (0, 1); (b) x = 0, 0 < y < 1 ; 

(c) min. 1 at x = 0,0 £ y £ 1. 

19. 1 at (0,1). 20. 1 at (0,0,0). 

§6.4 Page 153 

4. dz = it 2 . 5. 0.02. 


§6.5 Pages 159-162 


2- ff = -2y §£ + 2x (partial answer). 


^ dG _ dF dF dF dG dF 

du dX dy dz 9 d\V 

2 y \ 2 


“•(SHE)' 


dz 

22. (a) 3. 


= — (partial answer). 


§6.52 Pages 166-168 


5. 


2 d 
d£ d 7)‘ 


9. (a) F(r) = Ar 1 + B, A and B constant. 


§6.8 Pages 186-188 

1. abc/27. 4. A 2 /(3 abc). S. (Va + Vb + Vcf. 1. a 2 b 2 . 

8. (a 3 ' 2 + b 3/2 + c m )-\ 10. (c) Max. !, min. 1 ; (d) 9. 

11. 8A 3 /(27 abc). 13. bc/(l + b 2 ) 1 ' 2 . 14. |a-b|. 

19. Max. 54; min. 153 -45V5. 20. (2 af)~‘. 21. 2 a 2 . 

22. j. 23. (13 - 4V3) 1 ' 2 = 2x/3 - 1. 
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§6.9 Pages 193-194 

1. Semi-axes 1, V2/2. 2. Max. at (2/V5, -1/V5). 

3. Max. 10. 4. Max. 2; min. -1. 

5. (a) X, = V2 + 1, X 2 = 0, X 3 = 1-V2; (b) X, = 18, X 2 j=X 3 = 9. 

6. (a) X, = 1, X 2 = X 3 = -r, (b) 1; (c) x = y = z = ±^. 

7. (a) and (b) hyperboloids of one sheet; (c) ellipsoid. 

8. Max. 16; min. 4. 9. 

Miscellaneous Exercises Pages 194-195 

t 4,4 

1* d — 15, b — 5. 

3. (b) (0, a), (a, a), (a, 0), (fa, fa); (c) rel. max. at (fa, fa), others neither max. nor 
min.^ (d) no. 

4. 24V3. 5. (8,4, 12). 6. Max. 25; min. S- 

7. 1 = 1 + 1/V5, - = V5/2. 8. Max. 4; min. 3. 

X z 

9. (a q 4- b q + c q ) 1/q , where q = 1/(1 - p). 

11. Max. at (0, 1); min. at (f, 0). 

12. (a) 5V5; (b) V5; (c) no in (a), yes in (b). 


CHAPTER 7 

§7.1 Page 199 

1. Yes. 

§7.4 Pages 206-207 

1. (a) 1/V3; (b) 1/V3; (c) | (approx.). 7. (a) 1/V5; (b) f; (c) i 

§7.5 Pages 210-211 

1. sin h sin fc = hk - lh(h 2 + 31c 2 ) cos 0/i sin Ok - §k(3h 2 + k 2 ) sin 6h cos 0k. 

3. F(3 + h, 3 + k) - ^ (18h 2 - 18hk + 18k 2 ) + (6h 3 + 6k 3 ). 

4. log 2 + f[(x - 1) + y]-s[(x- l) 2 + 2(x - l)y -y 2 ] + * • *. 

5. 2(x — 1) + y + f[2(x — l) 2 + y 2 ] + • • *. 

7. 1 + (2xy - y 2 ) + (2x 2 y 2 - 2xy 3 + fy 4 ) + 

10. (a)x-y; (b) lir + ik* - 3) ~ s(y - 1). 

11. (a) h 4 - h 2 fc 2 + k 4 ; (b) (ir 2 /96)(h + k) 4 -fh 2 k 2 . 

§7.6 Pages 220-221 

1. (b) Saddle at (0,0); min. at (0,1) and (0,-2). (d) Max. at (0,0); four saddle 
points, (e) Saddle at (0,0), (0,3), and (4,0); max. at (!, 1). (f) Max. at (a/2, a/3); 
degenerate if x = 0 or y = 0. (g) Saddle at (1, 1), (0, 1), (1»0); max. at (f, f). (j) Min. 

at (6, 8). (k) Min. at (6, 8); saddle at (-f, -§). 

3. 5 critical points. 

4. Shortest distance = V2. Saddle point at (x, -x), where x 3 + 2x - 1 = 0. 
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5. Min. = a if 0 < a ^ 1; min. = V2 a - 1 if a > 1. 

6. Min. at (-2, 0, 1); max. at (t, 0, -®). 

7. Max. at (1, 1, 1); min. at (-1, -1,-1). 9. Min. at x = y = z = i 

Miscellaneous Exercises Page 221 

4. Does not have tangent plane at origin. 


CHAPTER 8 

§8.2 Pages 228-230 

1. Yes. 2. /(I + h, k) ~ -2 h - k. 3. Yes. No. 4. Yes. 

7. (b) No. (c) A spherical surface and its center. 11. No. 

12. An infinite n umber of isolated points, among them (0, ttI 2), ( tt , 57t/2), (-tt, 7t/2). 

14. (a) f(x) — xVx — 1 . (c) (0, 0) and (1, 0). 

15. (b) /(x) = x 1 + x 512 . 16. a 2 * 3b 2 . 17. Yes. 20. No. 

§8.3 Pages 234-236 

1. Yes. 2. (a) UoUoWo7^0. 3. UoCo ^ 0, Xo ^ yo> Xoyo ^ 0. 

4. (b) x 0 ^ y 0 , y 0 * z 0 , zo * x 0 . 5. y 0 ^ z 0 . 

6. (a) Xo + yo ^ 0* (b) Zo 0, y 0 ^ 2xo. 


CHAPTER 9 

§9.2 Pages 251-252 

1. The circle u 2 + u 2 ^ 1. 

3. The triangle bounded by u = v, u = -u, u = 1. 

5. Regions in the first and third quadrants, bounded by y=x + 3, y = x-3, xy = 4, 
xy = 16. 

6. u = x + y, v = y/(x + y). The u-curves are parallel lines of slope —1. The u -curves are 
straight lines through the origin. The x-curves are hyperbolas with asymptotes u = 0, 
v = 1. The y-curves are hyperbolas with asymptotes u = 0, u = 0. 

8. (a) (1 - x — y) -3 . (b) x = u(l + u + u) _1 , y = u(l + u + u)"' 1 . (c) The u-curves are 

straight lines through (0,1), the u-curves are straight lines through (1,0). (e) A 

quadrangle with vertices at (i 1), (1, 1), (1, 2), ( 5 , 2 ). 

9. (a) Ellipses if u > 0 or u < — 1 ; hyperbolas if — 1 < u <0. (c) Along the lines x = 0 and 

y=0. 

§9.5 Pages 262-263 

1. (b) Yes; all poin ts where u = 0 a re singul ar, (c) The regions where y 2 > x 2 . 

4. u = [kVx 2 +ry 2 + x)] I/2 , v = [kVx 2 + 4y i - x)] 112 . 

6. (a) The u-curves are hyperbolas. The u-curves are ellipses, (b) Singular points 
correspond to the foci x = ±1, y = 0. 

8. (a) r 2 sin (p . (b) A sphere. A nappe of a cone. A half plane. 

9. The singular points correspond to the circle x 2 + y 2 = 1, z = 0. The u-surfaces are tori. 
The u-surfaces are spheres. 
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§9.6 Pages 266-267 

1. No. 2. ad — be = 0. 

4. (a) w = u 2 -2v; (b) w> = -(1 + uv)/(u + v ). 

Miscellaneous Exercises Page 267 

1. (c ) -r 6 . 2. r 3 sin 2 <j> sin t//. 


CHAPTER 10 

§10.11 Page 274 

1. (a) Neither; (b) perpendicular; (c) neither; (d) collinear. 

2. (a) 7; (b) 6; (c) 3; (d) 5. 3. i 7. 2sin(0/2). 

§10.12 Pages 279-280 

3. v, = i, v 2 = j, v 3 = k. 

6. (b) V = 4A — C. 

§10.2 Page 283 

1. (a)i-2j + k; (c) -lli- 5j -k. 2. VHo. 3. (b) h/ 69 . 

4. (a) -79, 79. 5. (e) 2; (f) both zero. 6. (A x B) • (C x D) = 0. 


§10.3 Pages 285-286 

5. (a) (1,26,3); (-16,-31,29). 

6. (a) -i' + j' - V3 k’ ; (b) — 2i — V2(j - k). 


§10.51 Page 295 


1 . 


2 . 


(a) l(x' 2 - 6x'y' + y' 2 — 2z' 2 ); (b) F(x, y, z) = 2y; 

(c) — x' + V2z', y'^=(x' — y') + z'; (d) _ (^| + 

(b) i>(-23x 2 + 41y 2 +31z 2 -48xy + 72xz + 24yz); 

(c) '(3 y' + 2z')i' - ?(3x' + 6z')j' - 7 L (2x' - 6y')k’. 


O' 


3y-z 

2V2 



§10.6 Pages 298-300 

1. (a) (4x - 3y - 4z)i + (2y - 3x)j + (12z - 4x)k; 

(d) r _5 [-3xzi - 3yzj + (r 2 - 3z 2 )k], where r 2 = x 2 + y 2 + z 2 . 

2. -t : lit :-t, where t = (123)“ 1/2 . 3. -i- 2j - 2k; rate = 3. 

4. (a) 0; (b) 14V2. 7. (e) nr B ~ 2 R; (t) -2e~ ,2 R. 8. (c)r“' + C. 


§10.7 Page 305 

4. (b) Ar -1 + B. 5. (b) Cr’ 3 . 6. (b) V ■ V = 0. 

§10.8 Pages 307-308 

3. (a) 0; (c) -i - j - k; (e) 0. 


y+f 

2V2 
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Miscellaneous Exercises Page 308 
2. (a) nr"~ 2 R • A and nr"” 2 R x A; (b) 0. 


CHAPTER 11 

§11-§11.10 Pages 331-334 

7. The locus ||x||i = 1 is the periphery of the square with corners at (±1,0), (0, ±1). The 
locus ||x|| m = 1 is the periphery of the square with corners at (±1,±1) (four com- 
binations of signs). 


CHAPTER 12 

§12.1-§12.8 Pages 372-375 

9. (b) dF(f, h) = f'(u + tw)[w]h. 

11. g is not continuous at (0, 0). 

13. (b) DJ{ a) + D v /( a) = ||u + v|| D w /( a), where w = (u + v)/||u + v||. 


CHAPTER 13 

§13.3 Pages 389-390 

1. (b)5fl 3 ; (d )hrHa 2 ; (e) sir. abc; (f) iW a 2 h; (g) habc. 

3. la 3 . 

4. (a) (la, lb); (b) (l(a + b), |c); (c) «4a/3ir), 0); (e) (Ib, §H); (g) (a/2, \b). 
7. 7ra\ 


8. (a) 


/187 + 4V2 29+ 116V2\ , , /5 32, 

1 310 , 310 )> (c) (l4,35). 


§13.4 Pages 394-395 

1. (a) |a s ; (c) 57 ra 2 h; (e) ¥a 3 (37 r -4); (f) ?a 3 ; 


(g) 


[l + ^log(l + V2)j. 


12 


2. (c) y = 
(f)x = 


5a 
6 

a7rV2 

8 


/ . a 3V3 + 877 
(e) x = — 1 . r =~. ~Z " > y = 


11a 


2 3V3 + 277 2(3V3 + 277)’ 


3 One part is |a 3 (12V37r — 20). 5. 7 t/32. 


§13.5 Pages 399-401 

1 . (a) M = lka 2 b, x = la,y = lb; (c) M = \ka 3 , x = u^a, y -la; 
(e) M = Ikira 3 , x = y =\alir; (g) M = ^kab 2 ,x = ^a f y-lb; 
(i) M = x = fa, y = |>a. 
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3. (b) l x = jMa 2 ; (d) L = \Mb\ I y = iMa 
tMa 2 , I y = rMa 2 ; 

2 sin 3 0 cos /3 \ 


(f) h 


3/3-3 sin /3 cos /3 
2 sin 3 /3 cos /3 \ 
j3 - sin /3 cos /3 /' 

. crfl4 Ma 2 Ma 2 . 

4 - (b)_ F = Tir ; (d) — (f)0 - 


7. (a) 


<ra 3 b 3 
6 (a' + b s )’ 


(0 


4 

656cra 

35 


8. (a) (1 + V2)y = x, y ~-(l + V2)x; (c) y = ±x; (e) y =0,x 

9. lira 6 about y = -x, iWa 6 about y = x. 


§13.51 Page 404 


4 . 2 <rb log(l + V2). 5. 4o-b. 

6 . u = 2cr ( b tan -1 r==~=r 4- — log 

V vV+2a 2 2 


Vb 2 + 2a 2 - a/ 


irab 

2 


§13.7 Pages 411-412 

2. (a, j, I). 3. (a) (la, \b, Ic); (b) reAf(a 2 + c 2 ). 

4. x = reira, y = la. 5. (b) 2 V 2 — V3 — 1 . 

7. (a)(ia,ib,|c); (b) ;M(i> 2 + c 2 ). 

8. (a) M = 8a 3 , avg. density = 1; (b) h = iMa 2 ; (c) if. 

§13.8 Page 413 

2. lMa\ 3. (a) iMa 2 ; nM(3a 2 + h 2 ). 

§13.9 Page 416 

4. (b) 1 + 2\/2; (c) sir ^a. 


CHAPTER 14 

§14.2 Pages 420-421 

A A J 'a 2 , 1 , <0 + V2+ O) 5 

2. Avg. speed = sV2 + a) + — log — 7 = . 

a) v2 

3. 4ira. 9. aV 2 sinhti. 

§14.32 Pages 426-428 

5. (a)T = i,N=jJl = k,K=2,T = 3; (b) tiVI; 

. . 216V334 9 

(1621) 5,5 a’ r _ 167 a’ 

(d) T = 1, N = -jV2(j + k), B = W 2 (j - k), k = IV 2, r = 0; 

(e) k = (1 + 2 e 2 ') ,,2 (l + e 2, r m , r = 2 e'(l + 2 e 2 T‘; 

(g)T = jl + |k,B = -5i + |k,N = j, k = t = b. 
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8. (a) T = -i, N = ■—= (2j - k), B = — ~= (j + 2k), k = t = 0; 

V5 V5 2a 

(b) T = - Y (V2j - k), N = (61 + j + V2k), 

® r~Z 9 2j 2 V 2k) , K j T , 

V 13 9a 13a 

(c> T = 7TS (■ + k), N = -j, B = - 4 = (i - k), k = r = 

V2 V2 2a 8a 


§14.4 Pages 432-433 

3. (a) (x 2 / a 2 ) + (y 2 /b 2 ) = z 2 ; (c) (x 2 /a 2 ) + (y 2 /b 2 )-(zV) = 1; 

(e) (x 2 la 2 )-(y 2 lb 2 ) - z; 

(f) The part of the cone ( x 2 la 2 ) = (y 2 lb 2 ) + (z 2 lc 2 ) on which iSa. 

6. Cylinder, x 2 + y 2 = a 2 . 7. Plane, x + y = z. 

§14.5 Page 436 

1. Circles in which planes through z-axis intersect the torus; ds = bd</>. 

3. The u-curves are circles y 2 + z 2 = a 2 in planes perpendicular to the x-axis; the e-curves 
are straight lines parallel to the x-axis. ds 2 = du 2 + a 2 dv 2 . 

5. (a) ds 2 = (1 + 4k V) du 2 + u 2 dv 2 ; 

(c) ds 2 = (1 + r 2 sin 2 20) dr 2 + r 3 sin 40 dr d8 + r 2 (l + r 2 cos 2 20) d0 2 ; 

(e) ds 2 = (1 + 4u 2 + 9u 4 ) du 2 + 2(1 + 4ur + 9u 2 e 2 ) du dv + (1 + 4e 2 + 9e 4 ) dv 2 . 

6. (b) 0 tan <0 = log tan(jk + zrr). 

§14.6 Pages 443-^44 

2. 8V2a 2 . _ 3. 2iraV2. 4^ |ir[(l + aY 2 - 1]. 

5. jirab (2 V 2 - 1). _ 6. (a) jV2(a + b)Vab ; (b) jV2-. 

7. 7r[V2 + log(l + V2)]. 9. 8a 2 (ir - 2). 11. isW. 

15. 2a 2 . 


CHAPTER 15 

§15.12 Pages 449-451 


8 

3- 77. 


1. (a) (b) t; (C) -t; (d) 2 ~¥; (e) h (f) -2; (g) h (h) 

2. (a) v ^rkb) 16; (c) \ira 2 \ (d) -lira 4 ; (e) 5 tt; (f) 4 tt; (g) 2 tt; (h) \ab 

3. (a) * 77 ; (b)57r; (c) -lir. 4. ^7r(a 2 + 1). 6. i 7. ^ + 27r. 


O ^ ^ A / \ 16 

8 T? 9 - (a)_T; 


32 . . 32 

(b) — (c) 


3V3 


9V3 


L 


14. 2n- I xy dy. 


§15.13 Pages 454-455 

1. 2a 2 . 2. 2V2a. 3. 2w+f. 4. (a) 2a; (b) none. 

5. 50 ft. lb. 
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§15.3 Pages 462-463 

1. (a) 56; (b) 0; (c) 4ir; (d) 0; (e) ~Wa 4 ; (f) 0. 

§15.32 Pages 468-469 

1. 53s. 2. ( a 2 +b 2 ) 3. 55 . 4. 3c = 5j. 5. jmtb. 

6.3. 7. j(e - e~‘). 8. 4(1 - log 2 ). 9. t. 11. a. 

12. 1. 14. 2 log 2 -5. 

§15.41 Pages 477-478 

1. (b) and (e) not exact. 2. (b) -§; -'-f . 3. (b) 2 log 5. 

4. w = e x ; e x (x sin y - sin y + y cos y). 

5. (b) Along y = 0, x > - 1 ; (c) along y = 0, x 2 < 1 . 


§15.51 Pages 483-484 

1. z = ^. 2. (a) 9u; (b) 87 t; (c) ttt. 

3. VI 4.0. 5. k 2 a. 6.2. 

7. A = 2a 2 (7r - 2), Ajc = fa 3 , Ay = 5a 3 (37r - 8), Az = 7ra 3 . 


8. 16aV 


3 tt -7 




where e is the total charge. 


47 t Sira 2 tt . ., 

15. — 3~’ 3 c T ’ ~3~ ln corres P° n ^ in 8 cases. 

47 jcra 1 

16. 


Va 2 + c 2 + |a - c| 


§15.6 Pages 490-492 
4. 3a 4 . 6. 4irb 5 . 8. 87rh 2 . 


§15.62 Pages 497-499 

2. (a) 900; (b) (5, 5, -2). 3. iW(a 2 + b 2 )abc. 

4 313 _1 -1 ,1 —21 

. 32 7 T. 5 . 4. 6. 24 ana 5. 7 . ir 7 r. 


§15.7 Pages 504-505 

1 . - 2 V 2 irb 2 . 2. -ir. 3. —no 1 . 4. (d) -4ira 2 . 5. 0. 

6. (a) I(3 tt - 8)a 2 ; (b) tto 2 ; (c) la 2 . 

§15.8 Pages 509-510 

1. (b) aVc; (d) 1. 2. a -i + p + p - l. 

3. x 2 y + y log z. 4. ** + ^ + Z * - 3. 5. (xy - z)<T* y . 

6 . (b) x 2 + y 2 < a 2 , z = 0 ; (c) x 2 + y 2 = 0 , z^ 0 . 
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Miscellaneous Exercises Pages 510-511 

2. e x ^os y - 3 sin -1 x + 7 tan y. 4. Discontinuous along y = 0, x 2 > 1. 

5. V3. 7. 8/(3ir). 

9. 4mr(b/a){(a + b)E(k) + (a - b)K(k)}. 10. 7/16. 

CHAPTER 16 

§16.2 Pages 516-517 

1. (a) Open; (b) closed; (c) neither; (d) neither. 2. 0 and 1. 

3. (a) No; (b) yes, 0 and 1. 

6. (a) 1. Not closed, (b) 0. Closed, (c) 1 and -1. Not closed. 

§16.41 Pages 521-522 

1. Yes. 2. (c) x = 0, -1 ^ y ^ 1. 

3. Neither. All points of the square Oix^l, O^yil. 

CHAPTER 18 

§18.1 Page 539 

2. (b) 1. 3. I = 2, J = 4. 

§18.21 Pages 547-548 
1. la F(x)f'(x) dx. 

§18.4 Pages 549-550 

4. I 

7. (b) F(x) = -jx 2 + j if x < 0, F(x) = i + jx 2 if x > 0. 

8. (a) F(x) = -x - 1 if x <0, F(x) = — l+xifx>0. 

§18.5 Page 554 

1. F'(u) = 2.(1 -a 2 )-" 2 . 

Jo 1 — u X 

3. eT x4 -2x j t 2 e~ x2a dt. 

f*sin x 

4. (x 2 - sin 2 x) n cos x - (x 2 - x 4 )"2x + I 2 2xn(x 2 - t 2 ) n ~ l dt. 

7 - 1 ^ = p / ( x ’i) + L /,(x ’° <it - 

W = 7 f (l’ y ) + L Ms ’ y)ds - 

§18.6 Pages 556-557 

1. Yes. 2. Yes. 3. No. 

4. No. Use the fact that / is uniformly continuous. 
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§18.9 Page 565 

1. 3 in each case. 2 . — 5, 3, 3 respectively. 

3. 5, -15, 35 respectively. 4. (a) 2e _1 - 1; (b) \{eT 2 - 1). 

5 . (a) —3 log 2 ; (b) mn 2 . 

6. (a) 0, 1, 1, 3 respectively; (b) 4 V 7 - V 2 - 3 . 

7. (a) w(0) = 0, w(x) = 1 + 5 JC if 0 < x < 5, w(5) = 3, w(x) = 2 4- |x if 5 < x < 10, 
w(10) = 8; (b) 40 and ™ respectively. 

8 . (a) Second moment about y-axis is 

la x 2 dA(x ); (b) y J b dA(x) = 3/* <t>(x) dA(x). 

9. xtf dV(x) = f* xdV(x). 


CHAPTER 19 

§19 Pages 568-569 


1 . (b) Ixl < 1, 


8x , . , . 1 10 

(d)M>l,— 


(f) x > 0, 


(h) ^ < x < e, 
e 1 


logx' 

4. (a), (b), (d) are divergent; (c) is convergent. 


1+x 


5. Divergent. 


§19.2 Pages 576-577 

1 . (a) and (b) divergent; (c) and (d) convergent. 

5. (b), (c), and (f) divergent; (a), (d), (e), and (g) convergent. 


§19.21 Page 579 

1 . (a), (d), and (g) divergent; (b), (c), (e), (f), and (h) convergent. 
3. p > 1. 5. p > 1 and any q. Also p = 1 and q < - 1. 

§19.22 Page 581 

1 . (c) and (e) divergent; (a), (b), (d), (f), and (g) convergent. 

3. (a) and (d). 


§19.3 Page 585 

3. (a) and (d). 

§19.32 Pages 589-590 

2 . (b) and (d) divergent; (a) and (c) convergent. 

§19.4 Pages 595-596 

1. (b) Convergent if |x|^ 1, divergent if |x|> 1; (d) convergent if |x| < i, divergent if 
|x|^|; (f) convergent if |x|<2, divergent if |x|>2, indecisive if |x| = 2; (h) con- 
vergent if |x| <3, divergent if |x| > 5, indecisive if |x| = 

2. (b) Convergent if |x| < 1 , divergent if |x|^l; (d) indecisive; (f) convergent if 

|x| < V2, divergent if |x| ^ VI; (h) convergent if |x| < 5, divergent if |x| ^ \. 
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3. (b) Convergent; (d) divergent; (f) convergent if p > f, divergent if p < f, indecisive if 
P=i 

4. (a) Divergent; (c) convergent; (e) convergent if p > 2, divergent if p S 2; 

(f) divergent if p = f (see answer to 3(f)); (g) divergent. 

8. (b) The first two series are convergent; the third is divergent. 

Miscellaneous Exercises Pages 607-609 

1. Convergent if and only if -1 ^ x < 1. 

6. (a) Convergent, others divergent. 

7. (b) Convergent if and only if -2 ^ x < 2. 8. x > 

9. (a) No. Yes. (c) All values, (d) Convergent. 

10. (a) Convergent if and only if \x\ ^2. (b) Convergent if and only if \x\ ^ 1. 

(c) Convergent if and only if -e~ y ^ x < e~\ 


CHAPTER 20 

§20.1 Pages 617-618 

1. (a) No. (b) 0 < a < b < tt. (e) ab > 0. 

4. (a) 1 4- X 2 . 0. Yes. (c) No. 5. (b) No. No. Yes. 

§20.2 Page 620 

1. Uniform convergence for the ranges of x as indicated: (a) |x| ^ c if 0 < c < 1. 

(b) All x. (c) All x. (d) x ^ c if c > 0. (e) All x on any finite interval. 

§20.3 Page 621 

1. (a) Not uniformly convergent on any interval having 0 inside or at one end. 

3. The function is continuous at 0. 

§20.4 Pages 623-624 

4. JV f(x) dx = 0; Jo f n (x) dx tt/2. 

§20.5 Page 626 
1. All but (a) and (b). 


CHAPTER 21 

§21.1 Pages 631-632 

1. (a) i (b) 27. (c) 0. (d) «. (e) «. (f) 4. 

2. * if p < q ; o if p > q ; p~ p if p = q. 

3. 1. 9. 0.0976. 10. 0.4864. 11. 0.0045. 

§21.2 Pages 637-639 


12. 0.482. 
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12 . L q (x) = ql[l + E (- D" q(q 0 ( ' n 1} x"]. 

13. H m (x) = (2xr- r ^^(2xr 2 + m(m - IXm -g C m ra (23tr -4_ . , , 
(breaking off with the term of exponent 0 or 1 according as m is even or odd). 


§21.3 Pages 642-643 


1. (a) 1 + \x - ix 3 + siox 5 - • • 
(c) 1 + X + \x 2 + IT * 3 + ■ * *. 

(e) 1 — 2X — T2X 2 — 24X 3 — • * ■ 

(f) —3* ~ 45X 3 — 945X' — • • 


2 . (a) 


1-x 


(b) 


log(l + x) 
1 -x ' 


(c) 


X 


§21.4 Page 646 

3. (b) 0.09. 4. p > 3. 


§21.5 Pages 649-650 

1 . (a)l. (b) e'\ (c) 3. (d)e. (e) 00. (f) 1. 

4. (a) 1. (b) 1. (c) 1. (d)oo. ( e ) 1. (f) 1. 


CHAPTER 22 

§22.1 Pages 659-660 

1 . (a), (b), (d), convergent; (c), (e), (f) divergent. 

2. (a) Convergent, (b) Divergent. 

3. (a), (b), (d), (e), (f), (g), (i), (j), (k), (n), (p) convergent; others divergent. 

5. ml 7. m + 1 <n. 

§22.11 Pages 663-664 

1. (a) Divergent, (c) Divergent, (e) Convergent. 

2 . (a), (b), (c), (d), (f), ( 3 ), convergent; others divergent. 

4. Improper but convergent if 0 < a <1. 

5. n = 2, 3. 6 . (b) 0 <x < 1. 8 . 0<p<l. 

§22.12 Pages 665-666 

1 . (b), (c), (d), (e), (g), (i) convergent; others divergent. 

2. (d) Divergent; others convergent. 

3 . (a) 1 < p < 2 . (b) 0 <x < 1 . (c) p < 1 . (d)-l<p<l. 

(e) 0 < a < 2 . (f) 0 <x. (g) 0 <x<l. (h) 0 <a. 

4. To the left of x = 1 and above x + y = 1. 

§22.2 Pages 669--670 

4 . (a) 3 77 . (b)|V 7 r. 6 . (a) V 7 t/ 2 . (b)iigV'n-. 
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§22.3 Pages 672-673 

1. (b) Abs. convergent, (d) Convergent, (f) Abs. convergent, 
(h) Abs. convergent, (j) Abs. convergent. (1) Convergent, 
(n) Abs. convergent, (p) Convergent, (r) Abs. convergent. 
3. Yes, whenever 1 <p <3. 4. Yes, whenever 1 < p < 4. 

§22.4 Pages 676-678 

1. (b) Divergent; others convergent. 4. p <4. 

5. (a) 27 t. (b) (n < 3). (c) Convergent. 

(d) (n < 5 )- W Divergent, (f) 

6. (a) Convergent, (c) p < 1. 

§22.41 Pages 681-682 

2. No. 5. The value is ir/4. 


§22.5 Pages 687-690 
9. Virx. 11. jir log(b/a). 


§22.51 Page 692 


i I , 1 — 2t , 1 I 

3. 2 ”1” 6^- $6 . 

4. (a) — j cos 3t + j sin 3t + s cos 2 1. (b) 2 sinh 2 1 + sinh t. 

5. (a) r(n)(s + 1) ". (b) (s 2 + 2s + 2)"'. (c) 3(s 2 -4s + 13)“'. 


§22.7 Pages 698-699 


5. 

7. 


-5<a. 

(a) 8. (b) 


2"4!5! 
11 ! ‘ 



(d) 


Virr(j) 

12T(|) ' 


Miscellaneous Exercises Pages 705-708 
2. Both convergent. 

23. By choosing a = .0545632, the distance between the upper and lower bounds for 

ti ^ i 

' - ■ is reduced about 17% below what it was when a was taken to be n- 

n n e~ n V 27 rn 

(This is the reduction when n = 10). 
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Abel, Niels H., 605 
Abel’s summability identity, 605 
Abel’s test, 607 
Abel’s theorem, 644 

Absolute maximum, minimum, 22, 23, 128 
Absolute convergence 

of improper integrals, 670, 674 
of infinite series, 582 
Absolute value, 3, 74 
Acceleration vector, 425 
Accumulation point, 515, 520 
Affine function, 318 
Alembert, d’, see d’Alembert 
Alternating series, 587-588 
Ampere, 101 
Analytic function, 650 
Approximating sum, 39 
Arc, 418 

Archimedean law, 78 

Arc length, 418-419 

Areal density, 377 

Area of a surface, 437ff., 478^179 

Area under a curve, 39, 536 

Asymptotic approximation, 703 

Average value of a function, 46, 383 

Axiom of continuity, 77 

Axis of reals, 80 

Azimuth, 413 

Ball (open, closed), 321 
Basis, 276 

Bernoulli numbers, 643 

Bessel functions, 635, 637 

Beta function, 695-696 

Binomial series, 597-599 

Binormal, 425 

Bolzano, Bernard, 517 

Bolzano-Weierstrass theorem, 517, 520-521 

Borel, Emile, 523 

Boundary point, 119 

Bounded function, 86, 125-126, 529 

Bounded linear transformation, 324 

Bounded sequence, 61 

Bounded set, 86, 125, 321, 517 

Bound of a linear transformation, 324 

Bracket function (definition), 2-3 

Calculus, 1-2, 55 
Cartesian product, 363 
Cauchy, Augustin-Louis, 55 


Cauchy’s convergence condition, 522, 617 
Cauchy’s form of remainder, 102, 104, 105 
Cauchy’s generalized law of the mean, 95 
Cauchy-Schwarz inequality, 333 
Cauchy’s inequality, 118, 275 
Cauchy’s root test, 594, 595 
Center of curvature, 35, 425 
Center of gravity (or mass), 377, 409 
Centroid, 389 

Chain rule, 15, 16, 155-156, 202-203, 338-339 

Characteristic value, 370 

Characteristic vectors, 370 

Class C (1) , 354 

Closed interval, 20, 515 

Closed set, 119, 321, 515, 516 

Cluster point, 647 

Colatitude, 413 

Column matrix, 317 

Comparison test 

for improper integrals, 656 
for infinite series, 574 
Complement (of a set), 118, 514 
Composite function 15, 154 
Composition of two functions (notation for), 
241, 322 

Conditional convergence, 583, 670 
Connected set, 205, 533 
Constraint, 178 
Continuity 

and boundedness, 86, 126, 529 
and extreme values, 89, 126, 529 
and intermediate values, 90, 534 
and sequential limits, 527-528 
definition of, 7, 9, 85, 125, 321-322, 527- 
528 

of sums, products, and quotients, 85, 125, 
532 

Continuous differentiability, 354 
Continuous variable, 54 
Convergence 

of an improper integral, 654, 656, 674, 
678-679 

of an infinite series, 567 
of a sequence, 59, 321, 521 
Convex set, 205 
Co-ordinate-free, 269, 290 
Coulomb’s law, 401, 409 
Cramer’s rule, 238 
Critical point, 215 
Cross product, 280, 288 
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Curl, 306 
Curvature 

center of, 35, 425 
of a plane curve, 34, 35 
of a space curve, 424 
radius of, 425 
Curve 

definition, 417 
rectifiable, 419 
sectionally smooth, 418 
simple closed, 418 

Curvilinear co-ordinates, 250, 257, 258ff . 

Cut number, 77 
Cylindrical co-ordinates, 412 

d’Alembert, J. le R., 590 
Darboux, J. G., 543 
Darboux’s theorems, 93, 543, 556 
Decimal fractions, 79 
Dedekind, 78 

Definite integral, 38-39, 537 
Denumerability, 512 
Derivative 

definitions, 12, 338 

notations for, 12, 130-132 

of a function from R" to R m , 338 

of an integral, 47-48, 549 

one-sided, 13 

partial, 130 

second and higher order partial, 130-132 
Dieudonne, Jean, 348 
Difference quotient, 12 
Differentiable function, 13, 145, 146, 150, 337 
Differentiability 

and continuity, 15, 19, 147, 338 
and tangent plane, 149-150 
definition of, 13, 145, 146, 150, 197, 336-337 
necessary and sufficient conditions for, 
341 

of a function from R" to R m , 337 

of a real function of a real variable, 32, 

145 

of a real function of n real variables, 150, 
197, 336-337 

of a real function of two real variables, 

146 

of a scalar product of vector functions, 
367 

sufficient conditions for, 197, 199 
Differential form, 469, 505 
Differential geometry, 417 
Dimensionality 

of a vector space, 311-312 
of R", 276 


Directional derivative, 297, 343 

Dirichlet, P. G. Lejeune, 605 

Dirichlet’s test, 604-606 

Discriminant of a quadratic form, 190, 193 

Disjoint sets, 120 

Distance 

in a vector space, 319 
in R", 275 

Distance function, 320 
Divergence 

of an improper integral, 654, 656 
of an infinite series, 567 
of a vector field, 301, 491, 492 
Divergence theorem, 484, 488 
Domain of a function, 2, 243 
Dot product, 270, 274 
Double integral, 380, 554ff. 

Duhamel’s principle, 419, 447, 545-546 

e (the base of natural logarithms), 64, 66 

Electrostatic field, 302, 402 

Ellipse of inertia, 400 

Ellipsoid of inertia, 412 

Elliptic integrals, 420-421 

Empty (or void) set, 80, 206, 512 

Euclidean norm, 319 

Euclidean spaces R 2 , R 3 , R", 268-269, 274-275 

Eudoxus, 79 

Euler, Leonard, 170 

Euler’s constant, 579, 608 

Euler’s theorem, 170, 172 

Even function, 638 

E volute, 35 

Exact differentials, 469-470, 505-506 
Existence problem, 65 

Extremal problem with constraints, 177ff ., 182ff. 
Extremum, 20 

Factorial, 667 
Field, 73 

Finite-dimensionality, 3 1 1 
Finiteness of a set, 512 
First-class variables, 155 
First moment, 399 
Fluid kinematics, 162-164 
Fourier transforms, 655 
Function 

affine, 318 

bounded, 86, 125-126, 529 
composite, 15, 154 

continuous, 7, 9, 85, 125, 321-322, 527-528 
continuously differentiable, 354 
definition of, 2, 243 
differentiable, 13, 145, 146, 150, 337 
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elementary, 16, 37 
even, 638 

extreme values of 88ff., 126, 1 38fT. , 177ff. 
homogeneous, 168-169 
implicitly defined, 132, 222-223, 361ff. 
integrable, 537, 555 
multiple- valued, 2 
odd, 638 
of class C (1) , 354 
of several variables, 116ff. 
real analytic, 650 
single-valued, 2 
value of, 3, 243 
Functional dependence, 266 
Functional notation, 3 

Gamma function, 655, 666ff. 

Gauss, Karl Friedrich, 593 
Gauss’s test, 593, 595 
Gauss’s theorem, 457 
Generalized inverse, 369 
Generalized solution, 368 
General solution, 36, 37 
Geometric mean, 187-188 
Geometric series, 566 
Gibbs, Josiah W., 309 
Global theorem, 361 
Gradient, 296, 297, 342-343 
Gradient field, 506 
Gram, J. P., 277 
Gramian, 277 

Gram-Schmidt process, 277 
Gravitation, 294 

Gravitational field, 294, 409-410, 675 
Greatest lower bound, 81 
Green’s identities, 492 
Green’s theorem, 457, 463-464 
Gundelfinger’s rule, 219-220 

Harmonic series, 574, 578 
Heine-Borel theorem, 523, 525, 531 
Heine, Edward, 523 
Helix, 428 

Hermite polynomial, 639 
Hesse, Otto, 354 
Hessian, 353 
Holder’s inequality, 188 
Homogeneous function, 169 
Hospital’s rule, 107 

Identity transformation, 239-241 
Image, 248 

Implicit function concept, 132, 222-223 


Implicit function theorems, 225, 228, 232, 364, 
365 

Improper integral, 559, 577, 654, 673-674, 678- 
679 

Inequalities, 74 
Infinite-dimensionality, 312 
Infinite series 

absolutely convergent, 582 
alternating, 587-588 
conditionally convergent, 583 
convergent, 567 
definition, 567 
divergent, 567 
geometric, 566 
harmonic, 574, 578 
multiplication of, 600-602 
Infinity, 55 
Inner product, 270 
Integers, 72, 73, 76 
Integrability 

of absolute value of a function, 539 
of a continuous function, 539 
of a function, 537 
of a monotonic function, 540 
Integral 

as a limit of sums, 67, 543 
improper, 559, 577, 654, 673-674, 678-679 
iterated, 386, 406, 557 
of a derivative, 49, 550 
Integral test, 578 

Integration, theory of, 38ff., 535ff. 

Interior point, 119, 321 
Intermediate-value theorem, 90, 534 
Intersection of sets, 120, 320 
Interval of convergence, 629 
In the large, 250, 428 
In the small, 250 
Invariant, 287, 288 
Inverse 

of a linear operator, 327 
of a transformation, 240, 242-243 
of differentiation, 35, 48 
Inverse function theorem, 242, 356-357 
Inverse function theory, 237ff. 

Inversion theorem, 356-357 
Invertible operator, 327-328, 330 
Irrotational field, 306 
Isomorphic vector spaces, 312 
Iterated integral, 386, 406, 557 

Jacobi, Carl, 175 
Jacobian 

determinant, 175 

identical vanishing of, 264, 266 
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Jacobian — contd 
matrix, 341 
sign of, 251, 466 
Jordan curve, 418 

if(IR n ), 327 
i?(R",R m ), 313 

Lagrange, J. L., 101, 182 
Lagrange’s form of remainder, 100, 104, 105, 
106 

Lagrange’s method, 182ff. 

Lagrange’s multiplier, 183 
Laguerre polynomial, 639 
Lamina, 395 

Laplace, Pierre Simon de, 303 
Laplace’s equation, 303 
Laplace transform, 655, 690 
Laplacian, 303, 492, 494 
Law of the mean 

extensions of, 95, 99, 102 
formulations of, 26, 58, 204, 206 
for vector functions, 350-351 
Least squares solution, 368 
Least upper bound, 80 
Leibniz, 2, 55 
Leibniz’s rule, 552 
Level curve, 128 
Level surface, 129 
Limit 

defining a definite integral, 67 
from right or left, 5 
of a function, 4, 54, 55, 122 
of a sequence, 59 

of sums, products, quotients, 7, 53, 68 
Limit inferior, 647 
Limiting processes, 4 
Limit point, 515 
Limit superior, 647 
Linear algebra, 309 

Linear dependence, independence, 276 
Linear functional, 316 
Linear operator, 312, 327 
Linear space, 311 
Linear transformation 
definition of, 312 
inverse of, 312, 327 
matrix representation of, 314-315 
norm of, 325-326 
Line integral, 446-447 
Localization, 223, 250, 264, 361, 428 
Locus, 223 
Lower bound, 81 
Lower sum, 40, 536 
Loxodrome, 436 


Maclaurin’s series, 652 
Mapping, 247ff. 

Mathematical induction, 75, 77 
Matrix product, 313-314 
Matrix representation, 315 
Maxima and minima, 20ff ., 138ff., 177fL, 182ff. 
Mean-value theorem 
for derivatives, 26 
for integrals, 45, 105, 383 
Mesh fineness, 542 

Method of steepest descent, 344ff., 370-371 
Metric, 320 

Minkowski’s inequality, 188 
Moebius band, 503-504 
Moment 

first, 399 
of inertia, 397 
second, 398 

Monotonic function, 540 
Monotonic sequence, 60, 61 

Neighborhood, 20, 118, 120, 321, 514, 520 

Nested intervals, 83 

Newton, 2, 55 

Newtonian potential, 675 

Newton-Raphson method, 347 

Newton’s law of gravitation, 409 

Newton’s method, 347, 349, 371 

Nondegenerate critical points, 215, 218 

Nonsingular operator, 327-328 

Norm, 271, 275, 318, 325-326 

Normal derivative, 456 

Normal to a surface, 136-137 

Normal vector, see principal normal 

Normalized vectors, 273 

Numbers 

complex, 72 
irrational, 72 
natural, 72, 75-77 
pure imaginary, 72 
rational, 72, 78 
real, 72 

Oblique co-ordinates, 257 
Odd function, 638 

One-to-one correspondence, 248, 250, 512 

One-sided surface, 504 

Open interval, 20 

Open set, 118, 321, 514, 520 

Ordered field, 74 

Orientable surface, 503-504 

Orientation 

of a curve, 446 

of a surface, 500, 503-504 
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Orthogonality, 137 

Orthogonal vectors, 273, 275 

Orthonormal triad (or set), 273, 275 

Osculating plane, 425 

Osgood, W. F., 546 

Outer content, 555 

Outer normal, 484 

Pappus, 440, 460 
Parallelogram law, 271 
Parametric representation 
of a curve, 417 
of a surface, 429 
Parametric surface, 429 
Partial derivatives, 130ff. 

Partial differentiation 

changes in order of, 199ff. 
technique of, 130ff., 132ff., 154ff., 164ff. 
Partial sum, 567 
Particular solution, 36 
Partition, 536 
Partition weighting, 545 
Pi ('ir), computation of, 573 
Point function, 291, 293, 445 
Point set, 118, 512, 514, 520 (see also Set) 
Polar co-ordinates, 255-256, 390-392 
Polygonal arc, 205 
Polynomial of degree n , 10-11 
Potential, 403, 410, 675 
Power series, 627 
Preimage, 248 

Principal axes of inertia, 400, 412 
Principal normal, 424 
Product of inertia, 399, 409, 411 
Product of two transformations, 253 
Pythagoras, 79 

Quadratic forms, 189, 193, 218 

R", see Euclidean space R" 

Raabe’s test, 591-592, 595, 596 
Radical sign (meaning), 14 
Radius of convergence, 629, 648 
Radius of curvature, 425 
Radius of gyration, 397 
Range of a function, 2, 243 
Rational fractions, 72 
Rational function, 11 
Ratio test, 580, 590 
Real number scale, 80 
Real number system, 72, 78 
Rearrangement of a series, 586-587 
Rectifiable curve, 419 


Region 

convex, 508 
definition of, 120 
regular, 451 
Riemann, 556 
simply connected, 475 
x-simple or y-simple, 458 
xy-simple, 485 
xyz-simple, 487 
xyz- standard, 487 
Relative extreme 

necessary conditions for, 139 
sufficient conditions for, 211-213, 215, 217 
Relative maximum or minimum, 20, 138 
Remainder, see Taylor’s formula with 
remainder 
Residual, 368 

Resultant of two transformations, 253 

Riemann, Bernhard, 535 

Riemann integral, 535 

Riemann region, 556 

Rigid motion, 283 

Rolle, Michel, 27 

Rolle’s theorem, 27 

Rotation of axes, 189-192, 255, 284 

Row matrix, 317 

Saddle point, 139, 215 

Scalar, 269 

Scalar field, 295 

Scalar point function, 291 

Scalar product, 270 

Schlomilch’s form of remainder, 104 

Schmidt, E., 277 

Schwarz, H. A., 201 

Schwarz inequality, 332-333 

Schwarz’s theorem, 201, 202 

Second-class variables, 155 

Second moment, 398 

Sectional smoothness, 418 

Separated sets, 533 

Sequence 

bounded above, below, 61 
convergent, 62, 521 
decreasing, 61 
definition of, 58-59 
defining e, 64, 66 
nondecreasing, 60 
strictly increasing, 60 
Serret-Frenet formulas, 426-427 
Set 

boundary of, 119 
bounded, 86, 125, 321, 517 
closed, 119, 321, 515, 516 
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Set — contd 

complement of, 118, 514 
concept of, 80, 118, 512, 514 
connected, 205, 533 
convex, 205 
denumerable, 512-513 
finite, 512 
infinite, 512 
interior of, 119 
notation, 320 
open, 118, 321, 514 
Simple closed curve, 418 
Simple-connectedness, 475 
Simple surface element, 429 
Singular point, 257, 259 
Smooth curve, 379, 418 
Solenoidal field, 302 
Solid angle, 489 
Solution set, 363 
Space of n-dimensions, 243 
Sphere (in a vector space), 321 
Spherical co-ordinates, 261, 413-414 
Standard basis for R", 276 
Steepest descent, 344-347, 370-371 
Stieltjes integral, 560ff. 

Stieltjes, T. J., 560 
Stirling’s formula, 649, 699 
Stokes’s theorem, 499, 504 
Surface 

area of a, 437-440, 442-443, 478-480 
closed, 430 
concept of a, 428—430 
parametric, 429 
smooth, 430 
Surface element, 429 
Surface integral, 480 

Tangent plane, 135-136, 149 
Tangent vector, 421 
Taylor, Brook, 101 
Taylor’s formula 

for functions of several variables, 207-210 
with derivative remainder, 99, 102 
with integral remainder, 99 
Taylor’s series 

for functions of one variable, 570, 634- 
635,651 

for functions of several variables, 208 
Tensor, 436 
Tensor analysis, 158 
Toroidal co-ordinates, 263 


Torsion, 426 
Torus, 428, 434 

Transformation, 237, 243, 255, 312, 335 
Translation (of space), 271 
Transpose (of a matrix), 315, 332 
Triangle inequality, 275, 318, 319, 320 
Triple integral, 405 

u-curve, 250, 435 
Uniform continuity, 324, 529-531 
Uniform convergence, 610, 617, 620, 621, 624, 
683 

Union of sets, 120, 320 

Unit sphere (in a vector space), 319 

Unit vector, 319 

Upper bound, 80 

Upper sum, 40, 536 

Variable, 

dependent, 3 
first or second class, 155 
independent, 2 
limits of integration, 46ff. 
of integration, 46 
u-curve, 250, 435 
Vector field, 294 
Vector point function, 293 
Vector product, 280 
Vectors 

algebra of, 270, 310 
concept of, 269, 309-310 
Vector space 
abstract, 3 1 1 
2(R\ R m ), 313 
R 2 , R 3 , 268-269 
R", 274 

Vector velocity, 423 

Void (or empty) set, 80, 206, 512 

Volume under a surface, 378 

Wallis’s formula, 697 
Weierstrass, Karl, 517 
Weierstrass M-test, 619 
Work, 453 

x -simple (or y-simple) region, 458 
[x], the largest integer function, 2-3 
xy -simple region, 485 
xyz-simple region, 487 
xyz-standard region, 487 



